Skip to main navigation Skip to search Skip to main content

A Novel Diabetes Predict Framework of Ensemble Learning to Solve Missing and Imbalanced Data

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNpeer-review

Abstract

Due to the rapidly increasing prevalence of diabetes worldwide and how it significantly affects the mortality rate, machine learning has been widely used in this area to make disease prediction studies. However, based on the existing works, the study is still challenging with the limitations of missing and imbalanced data. In this paper, to overcome these gaps, a novel framework with ensemble learning is proposed to handle the issue of missing values and class imbalances. Firstly, in data preprocessing, the new missing data approach is presented by the feature-based discretisation of K-Nearest Neighbors with Gaussian Naive Bayes and continues with the designed imbalanced data comparison investigation of five distinct methods, which the synthetic minority oversampling technique of SMOTETomek shows the best performance. Additionally, multistage ensemble learning is developed with two stages that are based on the implementation of seven classification models with the grid search hyperparameter tuning and the process of cross-validation to resolve overfitting risk. Moreover, to provide comprehensive work with the proposed framework, two datasets were selected for the experiment evaluations, and both received excellent performance enhancement, especially the accuracy rate of 0.9920 in Pima Indians Diabetes Data with the ensemble models of random forest, gradient boosting, and catboost. Finally, the novel proposed framework indicates an outstanding performance compared to the other studies that meet the aim of effectively solving the problems of missing value and imbalanced data, which particularly improves the robustness for early prediction in diabetes chronic disease.
Original languageEnglish
Title of host publication2025 5th International Conference on Electrical, Computer and Energy Technologies (ICECET)
PublisherIEEE
Pages1-8
Number of pages8
ISBN (Electronic)9798331535599
ISBN (Print)9798331535599, 9798331535605
DOIs
Publication statusPublished - 9 Apr 2026

Publication series

NameInternational Conference on Electrical, Computer, and Energy Technologies, ICECET 2025

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • Diabetes
  • Ensemble Learning
  • Imbalanced Data
  • Machine Learning
  • Missing Value

Fingerprint

Dive into the research topics of 'A Novel Diabetes Predict Framework of Ensemble Learning to Solve Missing and Imbalanced Data'. Together they form a unique fingerprint.

Cite this