Diabetes Feature Engineering
Business Problem
It is desired to develop a machine learning model that can predict whether people have diabetes when their characteristics are specified. You are expected to perform the necessary data analysis and feature engineering steps before developing the model.
Dataset Story
The data set is part of the large dataset held at the National Institutes of Diabetes-Digestive-Kidney Diseases in the USA.
Variables
- Pregnancies: Number of pregnancies
- Glucose: 2-hour plasma glucose concentration in the oral glucose tolerance test
- BloodPressure: Blood Pressure (Diastolic-Low blood pressure) (mm Hg)
- SkinThickness: Skin Thickness
- Insulin: 2-hour serum insulin (mu U/ml)
- BMI: Body Mass Index
- DiabetesPedigreeFunction: A function that calculates the probability of having diabetes based on people in the lineage.
- Age: Age (year)
- Outcome: Information whether the person has diabetes or not. Have the disease (1) or not (0)
Requirements
matplotlib==3.7.1
numpy==1.24.3
pandas==1.5.1
seaborn==0.12.1
sklearn==1.3.1