Analytics

Using Predictive Analytics to Enhance Healthcare Outcomes: A Capstone Project Overview

Sep 24, 2024

In the realm of healthcare, early diagnosis and intervention are critical, especially for chronic conditions like diabetes. A recent capstone project delved into predictive analytics to improve the detection of diabetes, utilizing a dataset from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). This initiative aimed to develop a model that accurately classifies patients as diabetic or non-diabetic based on various medical predictors.

Methodology and Data Exploration

The dataset comprised medical predictor variables such as the number of pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, BMI, diabetes pedigree function, age, and a target variable indicating diabetes status. An initial descriptive analysis revealed missing values in key predictors like glucose, blood pressure, skin thickness, insulin, and BMI, which were subsequently addressed using imputation techniques.

Visual exploration highlighted the distribution and relationships among these variables, using histograms and scatter plots to visualize the data. A significant observation was the class imbalance, with non-diabetic cases outnumbering diabetic cases, necessitating techniques like resampling to ensure balanced model training.

Model Building and Evaluation

The project employed several machine-learning algorithms, including Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, Support Vector Machine (SVM), and Gradient Boosting. These models were evaluated using metrics such as accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) from the ROC curve.

Key findings included:

Top Performers: Logistic Regression, Random Forest, and Gradient Boosting demonstrated the highest accuracy and AUC scores, making them the most reliable for this dataset.
Moderate Performers: KNN and SVM showed decent performance but were less accurate in differentiating between diabetic and non-diabetic patients compared to the top models.
Lower Performer: The Decision Tree model had the lowest accuracy and AUC scores, indicating limited effectiveness in this context.

The analysis revealed that variables such as glucose, BMI, and insulin levels were significant predictors of diabetes, aligning with medical understanding.

Data Reporting and Visualization

A comprehensive dashboard created in Tableau provided a visual representation of the findings. This included:

Pie Charts: Highlighting the proportion of diabetic vs. non-diabetic cases, underscoring the dataset's class imbalance.
Histograms and Scatter Charts: Visualizing the distribution and relationships among predictor variables.
Heatmaps: Depicting the correlation between variables, which helped in understanding the interplay between different medical factors.

Conclusion and Implications

This capstone project underscored the potential of predictive analytics in healthcare, particularly in enhancing early diagnosis and personalized treatment strategies. The use of robust machine learning models and comprehensive data visualization facilitated a deeper understanding of the risk factors associated with diabetes.

The project exemplifies how data-driven approaches can aid in the development of effective healthcare interventions, ultimately leading to improved patient outcomes and more efficient healthcare systems.

Share this Article