Feature Engineering


Diabetes Feature Engineering

GitHub Repository:


Business Problem

It is desired to develop a machine learning model that can predict whether people have diabetes when their characteristics are specified. You are expected to perform the necessary data analysis and feature engineering steps before developing the model.

Dataset Story

The data set is part of the large dataset held at the National Institutes of Diabetes-Digestive-Kidney Diseases in the USA.

Variables
  • Pregnancies: Number of pregnancies
  • Glucose: 2-hour plasma glucose concentration in the oral glucose tolerance test
  • BloodPressure: Blood Pressure (Diastolic-Low blood pressure) (mm Hg)
  • SkinThickness: Skin Thickness
  • Insulin: 2-hour serum insulin (mu U/ml)
  • BMI: Body Mass Index
  • DiabetesPedigreeFunction: A function that calculates the probability of having diabetes based on people in the lineage.
  • Age: Age (year)
  • Outcome: Information whether the person has diabetes or not. Have the disease (1) or not (0)
Requirements
  • matplotlib==3.7.1
  • numpy==1.24.3
  • pandas==1.5.1
  • seaborn==0.12.1
  • sklearn==1.3.1