Evaluation of Regression Model
Project Tasks
Employees' years of experience and salary information are given.
Years of Experience (x) |
Salary (y) |
5 |
600 |
7 |
900 |
3 |
550 |
3 |
500 |
2 |
400 |
7 |
950 |
3 |
540 |
10 |
1200 |
6 |
900 |
4 |
550 |
8 |
1100 |
1 |
460 |
1 |
400 |
9 |
1000 |
1 |
380 |
Step 1: Create the linear regression model equation according
to the given bias and weight.
Bias: 275, Weight: 90
(y' = b+wx)
Step 2: Estimate the salary for all years of experience in
the table according to the model equation you have created.
Step 3: Calculate MSE, RMSE, MAE
scores to measure the success of the model.
Evaluation of Classification Model
Project Tasks
Task 1:
A classification model has been created that predicts whether the
customer is churn or not. The actual values of 10 test data
observations and the probability values predicted by the model are
Create a confusion matrix by taking the threshold value to
Calculate Accuracy, Recall, Precision,
F1 Scores.
Actual Value |
Model Probability Estimation (Probability of belonging to class 1)
1 |
1 |
0.7 |
2 |
1 |
0.8 |
3 |
1 |
0.65 |
4 |
1 |
0.9 |
5 |
1 |
0.45 |
6 |
1 |
0.5 |
7 |
0 |
0.55 |
8 |
0 |
0.35 |
9 |
0 |
0.4 |
10 |
0 |
0.25 |
Model Prediction |
Non-Churn (0) |
Churn (1) |
Actual Value
Non-Churn (0) |
? |
? |
? |
Churn (1) |
? |
? |
? |
? |
? |
Task 2:
A classification model has been created in order to detect
fraudulent transactions during transactions made through the bank.
The success of the model with 90.5% accuracy rate was found
to be sufficient and the model was taken live. However, after going
live, the output of the model was not as expected, and the business
unit reported that the model was unsuccessful. The
confusion matrix of the prediction results of the
model is given below. According to this;
Calculate Accuracy, Recall, Precision,
F1 Scores.
Comment on what the 'Data Science' team may have
Model Prediction |
Non-Churn (0) |
Churn (1) |
Actual Value
Non-Churn (0) |
900 |
90 |
990 |
Churn (1) |
5 |
5 |
10 |
905 |
95 |
- True Negative (TN): 900
- False Positive (FP): 90
- True Positive (TP): 5
- False Negative (FN): 5
Telco Customer Churn Prediction
Business Problem
Develop a machine learning model that can predict customers who will
leave the company.
Perform the necessary data analysis and feature engineering steps
before developing the model.
Dataset Story
Telco churn data includes information about a fictitious
telecom company that provided home phone and internet services to
7.043 California customers in the third quarter. It shows which
customers have left, stayed or signed up for their service.
- CustomerId: Customer ID
- Gender: Gender
SeniorCitizen: Whether the client is older (1, 0)
Partner: Whether the client has a partner (Yes, No)?
Married or not
Dependents: Whether the customer has dependents
(Yes, No) (Child, mother, father, grandmother)
tenure: The number of months the customer has stayed with
the company
PhoneService: Whether the customer has phone service
(Yes, No)
MultipleLines: Whether the customer has more than one line
(Yes, No, No Telephone service)
InternetService: Customer's internet service provider
(DSL, Fiber optic, No)
OnlineSecurity: Whether the customer has online security
(Yes, No, no Internet service)
OnlineBackup: Whether the customer has an online backup
(Yes, No, no Internet service)
DeviceProtection: Whether the customer has device
protection (Yes, No, no Internet service)
TechSupport: Whether the customer has technical support
(Yes, No, no Internet service)
StreamingTV: Whether the customer is broadcasting TV
(Yes, No, no Internet service). Indicates whether the
customer uses the Internet service to stream television programs
from a third-party provider
StreamingMovies: Whether the customer is streaming movies
(Yes, No, no Internet service). Indicates whether the
customer is using Internet service to stream movies from a
third-party provider
Contract: Customer's contract duration
(Month to month, One year, Two years)
PaperlessBilling: Whether the customer has a paperless
invoice (Yes, No)
PaymentMethod: Customer's payment method
(Electronic check, Postal check, Bank transfer (automatic),
Credit card (automatic))
MonthlyCharges: The amount charged to the customer monthly
TotalCharges: The total amount charged from the customer
Churn: Whether the customer used (Yes or No).
Customers who left in the last month or quarter
📝 Notes
- Each row represents a unique customer.
Variables include information about customer service, account, and
Services customers sign up for: Phone, multiple lines,
internet, online security, online backup, device protection,
tech support, and streaming TV and movies.
Customer account information: How long they have been a
customer, contract, payment method, paperless billing, monthly
fees, and total fees.
Demographic information about customers: Gender, age
range, and whether they have partners and dependents.
House Price Prediction Model
Business Problem
It is desired to carry out a machine learning project regarding the
prices of different types of houses, using the dataset containing
the features and house prices of each house.
Dataset Story
There are 79 explanatory variables in this dataset of residential
homes in Ames, Iowa. You can access the dataset and competition page
of the project, which also has a competition on Kaggle, from
the link below. Since the dataset belongs to a
Kaggle competition, there are two different
csv files: train and
test. House prices are left blank in the test dataset,
and you are expected to guess these values.
- Total Observations: 1.460
- Numeric Variable: 38
- Categorical Variable: 43
SalePrice: The property's sale price in dollars. This is
the target variable that you're trying to predict.
- MSSubClass: The building class
- MSZoning: The general zoning classification
LotFrontage: Linear feet of street connected to property
- LotArea: Lot size in square feet
- Street: Type of road access
- Alley: Type of alley access
- LotShape: General shape of property
- LandContour: Flatness of the property
- Utilities: Type of utilities available
- LotConfig: Lot configuration
- LandSlope: Slope of property
Neighborhood: Physical locations within Ames city limits
- Condition1: Proximity to main road or railroad
Condition2: Proximity to main road or railroad
(if a second is present)
- BldgType: Type of dwelling
- HouseStyle: Style of dwelling
- OverallQual: Overall material and finish quality
- OverallCond: Overall condition rating
- YearBuilt: Original construction date
- YearRemodAdd: Remodel date
- RoofStyle: Type of roof
- RoofMatl: Roof material
- Exterior1st: Exterior covering on house
Exterior2nd: Exterior covering on house
(if more than one material)
- MasVnrType: Masonry veneer type
- MasVnrArea: Masonry veneer area in square feet
- ExterQual: Exterior material quality
ExterCond: Present condition of the material on the
- Foundation: Type of foundation
- BsmtQual: Height of the basement
- BsmtCond: General condition of the basement
- BsmtExposure: Walkout or garden level basement walls
- BsmtFinType1: Quality of basement finished area
- BsmtFinSF1: Type 1 finished square feet
BsmtFinType2: Quality of second finished area
(if present)
- BsmtFinSF2: Type 2 finished square feet
- BsmtUnfSF: Unfinished square feet of basement area
- TotalBsmtSF: Total square feet of basement area
- Heating: Type of heating
- HeatingQC: Heating quality and condition
- CentralAir: Central air conditioning
- Electrical: Electrical system
- 1stFlrSF: First floor square feet
- 2ndFlrSF: Second floor square feet
LowQualFinSF: Low quality finished square feet
(all floors)
GrLivArea: Above grade (ground) living area square
- BsmtFullBath: Basement full bathrooms
- BsmtHalfBath: Basement half bathrooms
- FullBath: Full bathrooms above grade
- HalfBath: Half baths above grade
- Bedroom: Number of bedrooms above basement level
- Kitchen: Number of kitchens
- KitchenQual: Kitchen quality
TotRmsAbvGrd: Total rooms above grade
(does not include bathrooms)
- Functional: Home functionality rating
- Fireplaces: Number of fireplaces
- FireplaceQu: Fireplace quality
- GarageType: Garage location
- GarageYrBlt: Year garage was built
- GarageFinish: Interior finish of the garage
- GarageCars: Size of garage in car capacity
- GarageArea: Size of garage in square feet
- GarageQual: Garage quality
- GarageCond: Garage condition
- PavedDrive: Paved driveway
- WoodDeckSF: Wood deck area in square feet
- OpenPorchSF: Open porch area in square feet
- EnclosedPorch: Enclosed porch area in square feet
- 3SsnPorch: Three season porch area in square feet
- ScreenPorch: Screen porch area in square feet
- PoolArea: Pool area in square feet
- PoolQC: Pool quality
- Fence: Fence quality
MiscFeature: Miscellaneous feature not covered in other
- MiscVal: $Value of miscellaneous feature
- MoSold: Month Sold
- YrSold: Year Sold
- SaleType: Type of sale
- SaleCondition: Condition of sale
Talent Hunting Classification with Machine Learning using
SCOUTIUM's Dataset
Business Problem
Predicting which class (average, highlighted) players are
based on the points given to the characteristics of the football
players watched by the Scouts.
Dataset Story
The dataset consists of information containing the characteristics
and scores of the football players evaluated by the scouts according
to the characteristics of the football players observed in the
matches from Scoutium.
task_response_id: The set of a scout's evaluations of all
players on a team's roster in a match
- match_id: The id of the relevant match
- evaluator_id: The id of the evaluator (scout)
- player_id: The id of the relevant player
position_id: The id of the position played by the relevant
player in that match
1. Goalkeeper
2. Stopper
3. Right-back
4. Left-back
5. Defensive midfielder
6. Central midfielder
7. Right wing
8. Left wing
9. Attacking midfielder
10. Striker
analysis_id: Set of attribute evaluations of a scout for a
player in a match
attribute_id: The id of each attribute that the players
were evaluated for
attribute_value: The value (points) a scout gives to
a player's attribute
task_response_id: The set of a scout's evaluations of all
players on a team's roster in a match
- match_id: The id of the corresponding match
- evaluator_id: The id of the evaluator (scout)
- player_id: The id of the respective player
potential_label: A label that indicates a scout's final
decision regarding a player in a match (target variable)
Customer Segmentation with Unsupervised Learning using
FLO's Dataset
Business Problem
FLO wants to divide its customers into segments and determine
marketing strategies according to these segments. To this end,
customers' behaviors will be defined and groups will be created
based on clusters in these behaviors.
Dataset Story
The dataset consists of the information obtained from the past
shopping behaviors of customers who made their last purchases from
FLO as OmniChannel
(both online and offline shopper) in 2020-2021.
- master_id: Customer ID
order_channel: Shopping platform
(Android, ios, Desktop, Mobile, Offline)
last_order_channel: The channel where the most recent
purchase was made
- first_order_date: Customer's first order date
- last_order_date: Customer's last order date
last_order_date_online: Customer's last offline order date
last_order_date_offline: Customer's last online order date
order_num_total_ever_online: The total number of orders
made by the customer online
order_num_total_ever_offline: The total number of orders
made by the customer offline
customer_value_total_ever_offline: The total price paid by
the customer for offline orders
customer_value_total_ever_online: The total price paid by
the customer for online orders
interested_in_categories_12: List of categories the
customer has shopped in the last 12 months