Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects

Kirso Kirso (1), Agus Subekti (2)
(1) Universitas Nusa Mandiri, Indonesia,
(2) Universitas Nusa Mandiri, Indonesia

Abstract

This study conducted experiments using ensemble methods, hyperparameter tuning, and voting to improve software defect prediction across multiple projects using the Kamei dataset. Five machine learning models LightGBM, XGBoost, Random Forest, Extra Trees, and Gradient Boosting were applied to six projects: Bugzilla, Columba, JDT, Mozilla, Platform, and Postgres. Overall, the models demonstrated good performance when tested on datasets of projects with similar characteristics or strong relationships, such as Mozilla, JDT, and Platform, achieving accuracy and F1 scores above 80%. This indicates that defect patterns learned from one project can be effectively applied to similar projects. However, the models’ performance dropped significantly when predicting defects in the Bugzilla project from other projects, indicating notable differences in defect patterns or feature incompatibility. Differences in data distribution across projects remain a major challenge in CPDP. Therefore, domain adaptation techniques or feature transformation methods are needed to reduce inter-project differences, enabling the models to better recognize defect patterns across projects. Despite some improvements, data differences and class imbalance still limit prediction performance. Future research should address these challenges.

Full text article

Generated from XML file

References

Bala, Y., Samat, P., Sharif, K., & Manshor, N. (2023). Improving cross-project software defect prediction method through transformation and feature selection approach. Ieee Access, 11, 2318-2326.

Bhat, N., & Farooq, S. (2022). Local modeling approach for cross-project defect prediction. Intelligent Decision Technologies, 15(4), 623-637.

Chen, L., Wang, C., & Song, S. (2022). Software defect prediction based on nested-stacking and heterogeneous feature selection. Complex Intell. Syst. 8, 3333–3348.

Goel, L., Nandal, N., & Gupta, S. (2022). An optimized approach for class imbalance problem in heterogeneous cross project defect prediction. F1000research, 11, 1060., https://doi.org/10.12688/f1000research.123616.1.

Gul, S., Faiz, R., Aljaidi, M., Samara, G., Alsarhan, A., & & Al–Qerem, A. (2023). Impact evaluation of significant feature set in cross project for defect prediction through hybrid feature selection in multiclass. bioRxiv 2023.07.20.549868, https://doi.org/10.1101/2023.07.20.549868.

Haque, R., Ali, A., McClean, S., Cleland, I., & Noppen, J. (2024). Heterogeneous cross-project defect prediction using encoder networks and transfer learning. Ieee Access, 12, 409-419., https://doi.org/10.1109/access.2023.3343329.

Kumar, H., & Saxena, V. (2024). Software defect prediction using hybrid machine learning techniques: a comparative study. Journal of Software Engineering and Applications, 17(04), 155-171.

Lei, T., Xue, J., Man, D., Wang, Y., Li, M., & & Kong, Z. (2024). Sdp-mtf: a composite transfer learning and feature fusion for cross-project software defect prediction. Electronics, 13(13), 2439.

li, t., Wang, Z., & & Shi, P. (2024). Within-project and cross-project defect prediction based on model averaging. PREPRINT (Version 1) available at Research Square , https://doi.org/10.21203/rs.3.rs-4734176/v1.

Sekaran, K., & Lawrence, S. (2025). Leveraging levy flight and greylag goose optimization for enhanced cross‐project defect prediction in software evolution. Journal of Software Evolution and Process, 37(3), https://doi.org/10.1002/smr.70013.

Tahir, T., Gencel, Ç., Rasool, G., Umer, T., Rasheed, J., & Yeo, S. (2023). Early software defects density prediction: training the international software benchmarking cross projects data using supervised learning. Ieee Access, 11, 141965-141986, https://doi.org/10.1109/access.2023.3339994.

Zhao, Y., Zhu, Y., Yu, Q., & Chen, X. (2022). Cross-project defect prediction considering multiple data distribution simultaneously. Symmetry, 14(2), 401., https://doi.org/10.3390/sym14020401.

Authors

Kirso Kirso
14230033@nusamandiri.ac.id (Primary Contact)
Agus Subekti
Kirso, K., & Subekti, A. (2025). Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects. MUST: Journal of Mathematics Education, Science and Technology, 10(1), 58–67. https://doi.org/10.30651/must.v10i1.26548

Article Details

Similar Articles

<< < 1 2 3 4 > >> 

You may also start an advanced similarity search for this article.