The Random Forest Algorithm and Logistic Regression for Classification with Application

Authors

  • Afiaa Raheem khudhair college of Administration and Economics, University of Thi-Qar, Thi-Qar, Iraq Author https://orcid.org/0009-0008-5227-0527
  • Nadia Ali Ayyed college of Administration and Economics, University of Basra, Basra, Iraq Author

DOI:

https://doi.org/10.62933/f3g3c233

Keywords:

Random forest algorithm, , Logistic Regression,, genetic algorithm,, classification,, heart disease

Abstract

The increasing use of modern algorithms and their diverse applications in medical and social fields has raised a critical research question if can a genetic algorithm, specifically Random Forest, be effectively applied for classifying the response variable, and how does. added its integration with the traditional Logistic Regression model enhance its performance, in this paper explore the integration of Random Forest with Logistic Regression to classify heart disease data. The results demonstrate that this combined approach significantly improves classification accuracy and reduces error rates compared to the standalone algorithm. We use the real data from heart disease patients, represented by 13 independent variables and a binary response variable (0 indicating no disease, and 1 indicating disease), we provide a comprehensive analysis of the model's performance. The algorithm was implemented and evaluated using the R programming environment, yielding strong results that underscore the power and quality of the combined approach for handling massive data applications.

References

[1] M. Ozcan and S. Peker, “Healthcare Analytics A classification and regression tree algorithm for heart disease modeling and prediction,” Healthc. Anal., vol. 3, no. December 2022, p. 100130, 2023, doi: 10.1016/j.health.2022.100130.

[2] G. Biau and E. Scornet, “A Random Forest Guided Tour To cite this version : HAL Id : hal-01221748 Introduction,” 2016.

[3] Pang, H., Zhao, H., & Tong, T. (2006). Random forest method for pathway-based analysis of microarray data. BMC Bioinformatics, 7, 49

[4] D. A. Hadi, D. Agustin, and N. Sirodj, “Metode Random Forest untuk Klasifikasi Penyakit Diabetes,” pp. 428–435.

[5] Riddick, G., Song, H., Nakai, K., Li, Y., Imoto, S., Shimamura, T., & Tsuda, H. (2011). Predicting in vitro drug sensitivity using random forest. Bioinformatics, 27(17), 2200–2207.

[6] Chen, W., Liu, W., & Zhang, H. (2017). Random forest-based approach identifies differential gene expression in type 2 diabetes. Computational and Structural Biotechnology Journal, 15, 432–439.

[7] A. Ridwan, “Optimizing E-commerce Inventory to prevent Stock Outs using the Random Forest Algorithm Approach,” vol. 4, no. April, pp. 107–120, 2024.

[8] Kahya, M. A. (2019). "Classification enhancement of breast cancer histopathological image using penalized logistic regression". Indonesian Journal of Electrical Eng ineering and Computer Science, 13(1), 405-410.)

[9] A. Raheem and S. M. Hussein, “Journal of Economics and Administrative Sciences ( JEAS ) Performance Classification for Lasso Weights with Penalized Logistic Regression for High-Dimensional Data,” vol. 30, no. 139, pp. 149–160, 2024.

[10] Starbuck, C. (2023). Logistic regression. In The fundamentals of people analytics with applications in R (pp. 223–238). Springer. https://doi.org/10.1007/978-3-031-28674-2

[11] Srimaneekarn, N., Hayter, A., Liu, W., & Tantipoj, C. (2022). Binary Response Analysis Using Logistic Regression in Dentistry. International Journal of Dentistry, Volume 2022, Article ID 5358602, 7 pages. https://doi.org/10.1155/2022/5358602

[12] A. Liaw and M. Wiener, “Classification and Regression by randomForest,” vol. 2, no. December, pp. 18–22, 2002.

[13] Algamal, Z. Y., & Lee, M. H. (2015). "Applying penalized binary logistic regression with correlation-based elastic net for variables selection". Journal of Modern Applied Statistical Methods, 14(1), 168-179.

[14] Algamal, Z. Y., & Lee, M. H. (2015). "High dimensional logistic regression model using adjusted elastic net penalty". Pakistan Journal of Statistics and Operation Research, 11(4), 667-676.

[15] Kalina, J. (2014). "Classification methods for high-dimensional genetic data". Biocybernetics and Biomedical Engineering, 34(1), 10-18 pp 12-14.

Downloads

Published

2025-05-11

Issue

Section

Original Articles

How to Cite

The Random Forest Algorithm and Logistic Regression for Classification with Application. (2025). Iraqi Statisticians Journal, 2(special issue for ICSA2025), 99-104. https://doi.org/10.62933/f3g3c233