Abstract:As the number of software on the market increases dramatically, the importance of software quality gradually intensifies, making software testing an indispensable part of the software development process. With the growing demand for software testing, emerging technologies have been widely applied in testing, among which machine learning's predictive models and scalability have gradually become mainstream technologies for software defect prediction. However, in this context, software prediction faces a series of issues, especially the class imbalance problem and prediction accuracy issues. This paper proposes a supervised learning-based software prediction method targeting these two core problems. Specifically, the approach involves balancing the samples in the datasets (KK1, KK3, PK2) from the NASA database, using the SMOTE algorithm for over-sampling and the ENN algorithm for under-sampling. Then, the paper compares and analyzes the actual effects of these three datasets using various algorithms based on supervised learning, including Local Weighted Learning (LWL), J48, C4.8, Random Forest, Bayesian Belief Network, Multilayer Feedforward Neural Network, Support Vector Machine (SVM), and NB-K. The results indicate that the SMOTE+ENN+Random Forest model can effectively address the class imbalance problem, while other methods have certain limitations in comparison.