Analyze the Factors that Contribute to Accurate performance Predictions, Including Data Preprocessing, Feature Selection, and Model Hyperparameters

Authors

  • Vinod K C Research Scholar, University of Technology, Jaipur, Rajasthan
  • Dr. Suhas Rajaram Mache Professor, Department of Computer Science, University of Technology, Jaipur, Rajasthan

DOI:

https://doi.org/10.29070/9b2tdk74

Keywords:

Prediction, Parameter tuning, Feature Selection, model hyperparameters

Abstract

A solid academic record boosts a university's standing and promotes student career chances, hence predicting academic performance has attracted attention in education. Using clusters obtained by Davies' Bouldin approach, a clustering data mining technique known as K-means is used in this study to identify critical characteristics impacting students' performance. Machine learning techniques find use in many fields, including medical diagnostics, image processing, cluster analysis, pattern identification, and natural language processing. Among the algorithms tested, SVM produced the most accurate predictions (96% accuracy rate) after parameter tweaking. The researchers in this study have looked at how the SVM, Decision Tree, naive Bayes, and KNN classifiers work. The results of adjusting the parameters significantly improved the four prediction models' accuracy. Feature selection algorithms and hyperparameter optimisation, two critical components for enhancing model performance, are also addressed. The findings highlight the need of carefully evaluating models, with Random Forest emerging as a dependable choice for accurate diabetes prediction.

References

Frank Hutter et.al “Algorithm runtime prediction: Methods & evaluation” Artificial Intelligence Volume 206, January 2014, Pages 79-111

Ashir Javeed, Sanam Shahla Rizvi, Shijie Zhou, Rabia Riaz, Shafqat Ullah Khan, Se Jin Kwon, "Heart Risk Failure Prediction Using a Novel Feature Selection Method for Feature Refinement and Neural Network for Classification", Mobile Information Systems, vol. 2020, Article ID 8843115, 11 pages, 2020. https://doi.org/10.1155/2020/8843115

Isabella M. Tromba “MakeML: Automated Machine Learning from Data to Predictions”2018

Subhash Chandra Gupta et.al “Enhancing The Performance Of Diabetes Prediction Using Tuning Of Hyperparameters Of Classifiers On Imbalanced Dataset” DOI : 10.21817/indjcse/2021/v12i6/211206049 Vol. 12 No. 6 Nov-Dec 2021

Jia Wu et.al “Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization” Journal of Electronic Science and Technology Volume 17, Issue 1, March 2019, Pages 26-40

Ouyang, B., Song, Y., Li, Y., Sant, G. and Bauchy, M. (2021) Ebod: An Ensemble-Based Outlier Detection Algorithm for Noisy Datasets. Knowledge-Based Systems, 231, Article ID: 107400.

https://doi.org/10.1016/j.knosys.2021.107400

Li, L. and Talwalkar, A. (2020) Random Search and Reproducibility for Neural Architecture Search. Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, Vol. 115, 367-377.

Jian, S.-W., Cheng, H.-Y., Huang, X.-T. and Liu, D.-P. (2020) Contact Tracing with Digital Assistance in Taiwan’s Covid-19 Outbreak Response. International Journal of Infectious Diseases, 101, 348-352.

https://doi.org/10.1016/j.ijid.2020.09.1483

Mahesh, B. (2020) Machine Learning Algorithms—A Review. International Journal of Science and Research, 9, 381-386.

Jain, G., Mittal, D., Thakur, D. and Mittal, M.K. (2020) A Deep Learning Approach to Detect Covid-19 Coronavirus with X-Ray Images. Biocybernetics and Biomedical Engineering, 40, 1391-1405.

https://doi.org/10.1016/j.bbe.2020.08.008

Aggarwal, C. C. (2014). Data classification: Algorithms and applications. CRC Press.

Aljawarneh, S. A. (2020). Reviewing and exploring innovative ubiquitous learning tools in higher education. Journal of Computing in Higher Education, 32(1), 57–73.

Baker, R. S. (2014). Educational data mining: An advance for intelligent systems in education. IEEE Intelligent Systems, 29(3), 78–82.

Belanche, L.A, & González, F.F. (2011). Review and evaluation of feature selection algorithms in synthetic problems. arXiv preprint arXiv: 1101.2320.

Bollier, D., & Firestone, C. M. (2010). The promise and peril of big data (pp. 1–66). Aspen Institute, Communications and Society Program.

Downloads

Published

2024-10-01

How to Cite

[1]
“Analyze the Factors that Contribute to Accurate performance Predictions, Including Data Preprocessing, Feature Selection, and Model Hyperparameters”, JASRAE, vol. 21, no. 7, pp. 12–16, Oct. 2024, doi: 10.29070/9b2tdk74.

How to Cite

[1]
“Analyze the Factors that Contribute to Accurate performance Predictions, Including Data Preprocessing, Feature Selection, and Model Hyperparameters”, JASRAE, vol. 21, no. 7, pp. 12–16, Oct. 2024, doi: 10.29070/9b2tdk74.