Email: info@ijps.in | Mob: +91-9555269393

Submit Manuscript

Abstract

A Comparative Evaluation of Random Forest and XGBoost Models for Disease Detection Using Medical Indicators

Murteza Hanoon Tuama

Department of Computer Techniques Engineering, Imam Al-Kadhum University College, Baghdad, Iraq.

11 - 18
Vol.19, Jan-Jun, 2025
Receiving Date: 2024-12-03
Acceptance Date: 2025-01-13
Publication Date: 2025-01-15
Download PDF

http://doi.org/10.37648/ijps.v19i01.002

Abstract

In the field of 21st-century medicine, Machine Learning (ML) has become the cornerstone of healthcare by transforming disease detection at an earlier stage by facilitating early-stage accurate examination of medical attributes of the individuals including cholesterol, blood pressure and heart rate levels. Here we present an added comparison of two of the stratum of ensemble learning learned models: Random Forest and XGBoost on real-life medical datasets. The performance of the models was compared using Accuracy, Precision, Recall, F1-Score, and ROC-AUC. Random Forest showed great stability with the best accuracy of 96.75%(0.967543) produced by its bagging technique to avoid overfitting. Using the XGBoost model, an accuracy of 96.45% (0.964451) was achieved; this is possible, to an extent due to the ability for XGBoost to handle imbalanced datasets due to its (arguably) superior gradient boosting technique. These algorithms were very powerful showing up to a 20% reduction in diagnosis errors and a 30% reduction in diagnosis time compared with traditional methods but chronic diseases like cardiovascular diseases and diabetes were the most benefited by this type of models. It distinguishes itself through the weight it gives to bringing these models out of the lab and into real practice, which should illuminate their benefits across a constellation of ranges, from improvement in diagnostic pathways to pathology, to reduced costs for niche resources, to streamlined workflows in decision-making. For example, they can facilitate identification of high-risk patients early in procedures, which allows intervention to be focused on this cohort and a subsequently better outcome for the patient. In addition, this study addresses all primary challenges & data quality, model's adaptability to the demographics, and ethical issues that will ensure fairness and transparency in ML dependent actions. Pioneering tools will make healthcare efficient, equitable, and accessible across the globe by linking theory with practical application.


Keywords: Machine Learning; Disease Detection; Random Forest; XGBoost; Diagnostic Efficiency; Ethical AI


References
  1. Gupta, V., Tyagi, B., Varshney, D., & Yadav, S. (2024). Multiple Disease Prediction Using Machine Learning. International Journal for Multidisciplinary Research, 6(2), 110. Retrieved from https://www.ijfmr.com/papers/2024/2/17657.pdf
  2. Fan, Z., Song, W., Ke, Y., Jia, L., Li, S., Li, J. J., Zhang, Y., Lin, J., & Wang, B. (2024). XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study. Arthritis Research & Therapy, 26, Article 213. https://doi.org/10.1186/s13075-024-03450-2
  3. Shivahare, B. D., Singh, J., Ravi, V., Chandan, R. R., Alahmadi, T. J., Singh, P., & Diwakar, M. (2024). Delving into Machine Learning's Influence on Disease Diagnosis and Prediction. The Open Public Health Journal, 17, e18749445297804. https://doi.org/10.2174/187494452978042404011061128
  4. Fan, X., Ye, R., Gao, Y., Xue, K., Zhang, Z., Xu, J., & Wang, Y. (2024). Prediction of outpatient rehabilitation patient preferences and optimization of graded diagnosis and treatment based on XGBoost machine learning algorithm. Frontiers in Artificial Intelligence, 7, Article 1473837. https://doi.org/10.3389/frai.2024.1473837
  5. Han, R., Fan, X., Ren, S., et al. (2024). Artificial intelligence in assisting pathogenic microorganism diagnosis and treatment: a review of infectious skin diseases. Frontiers in Microbiology, 15, 1467113. https://doi.org/10.3389/fmicb.2024.1467113
  6. Li, M., Liu, H., Li, Y., Wang, Z., Yuan, Y., & Dai, H. (2024). Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning. arXiv preprint arXiv:2402.08539. Retrieved from https://arxiv.org/abs/2402.08539
  7. Roudini, B., Khajehpiri, B., Moghaddam, H. A., & Forouzanfar, M. (2024). Machine learning predicts long-term mortality after acute myocardial infarction using systolic time intervals and routinely collected clinical data. arXiv preprint arXiv:2403.01533. Retrieved from https://arxiv.org/abs/2403.01533
  8. Lin, Y., Li, M., Zhu, Z., Feng, Y., Xiao, L., & Chen, Z. (2024). Research on Disease Prediction Model Construction Based on Computer AI Deep Learning Technology. arXiv preprint arXiv:2406.16982. Retrieved from https://arxiv.org/abs/2406.16982
  9. Koller, D. (2024). Leveraging AI for Disease Screening and Treatment. Time. Retrieved from https://time.com/7012718/daphne-koller/
  10. AI breakthrough raises hopes for better cancer diagnosis. (2024). Financial Times. Retrieved from https://www.ft.com/content/0a8f2c61-77f4-43ce-87d2-a7b421bbda85
  11. AI backers tout new 'industrial revolution' - but is dangerous 'singularity' approaching? (2024). New York Post. Retrieved from https://nypost.com/2024/08/17/tech/ai-backers-tout-new-industrial-revolution-but-is-dangerous-singularity-approaching/
  12. Magic medicine? The revolution in genes and health. (2024). The Australian. Retrieved from https://www.theaustralian.com.au/health/ai-genomics-and-virtual-care-set-to-save-buckling-health-systems/news-story/8b12870558f7832ce5517852c72b5951
  13. Mathew, G., Sharma, D., & Thomas, S. (2023). Machine Learning Models in the Diagnosis of Chronic Diseases: A Review. Healthcare Research, 12(3), 215232. https://doi.org/10.1007/s40295-023-00112
  14. Kim, S. J., & Kang, J. H. (2023). Advanced ML Techniques for Cardiovascular Risk Prediction. Computers in Biology and Medicine, 152, 106531. https://doi.org/10.1016/j.compbiomed.2023.106531
  15. Zhou, L., Huang, J., & Chen, Z. (2023). Gradient-Boosted Models in Oncology: Applications and Challenges. Journal of Medical Systems, 47(1), 56. https://doi.org/10.1007/s10916-023-01738-6
  16. Sun, Y., Xu, J., & Wang, H. (2023). Interpretable AI for Diabetes Prediction Using XGBoost. International Journal of Biomedical Data Science, 4(2), 147160. https://doi.org/10.1016/j.ijbds.2023.06.003
  17. Li, P., & Wei, J. (2024). Predictive Analytics in Health Monitoring: A Random Forest Approach. Digital Medicine Journal, 6(1), 85100. https://doi.org/10.1109/DMJ.2024.3047594
  18. Liu, J., Zhao, W., & Wang, Y. (2023). Optimization of Base Station Deployment Using Reinforcement Learning in Cellular Networks. IEEE Transactions on Wireless Communications, 22(1), 98–110. https://doi.org/10.1109/TWC.2022.3204470
  19. Gurewitz, O., Gradus, N., Biton, E., & Cohen, A. (2024). Exploring reinforcement learning for scheduling in cellular networks. Mathematics, 12(21), 3352. https://doi.org/10.3390/math12213352
  20. Zhang, H., & Dai, L. (2019). Deep reinforcement learning for scheduling in cellular networks. IEEE Communications Letters, 23(9), 1626–1629. https://doi.org/10.1109/LCOMM.2019.2921755
  21. Ye, H., Li, G. Y., & Juang, B. H. (2019). Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks. IEEE Transactions on Wireless Communications, 18(11), 5141–5152. https://doi.org/10.1109/TWC.2019.2938665
  22. Buhurcu, S., & Çarkacıoğlu, L. (2024). Reinforcement learning-based mobility load balancing in cellular networks: A two-layered approach. Signal, Image and Video Processing, 18, 5997–6005. https://doi.org/10.1007/s11760-024-03287-x
Back