基于机器学习的软件缺陷预测研究
DOI:
CSTR:
作者:
作者单位:

1.广汽埃安新能源汽车股份有限公司研发中心;2.广州城市理工学院工程研究院;3.华南理工大学

作者简介:

通讯作者:

中图分类号:

基金项目:


Research on Software Defect Prediction based on Machine Learning
Author:
Affiliation:

1.GAC AION new energy automobile CO.LTD;2.Engineering Research Institute, Guangzhou City University of Technology;3.Guangzhou City University of Technology

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在机器学习技术逐渐渗透到各个领域的背景下,软件开发流程中的软件测试非常重要,面对在软件缺陷预测过程中出现的类别不平衡和准确性挑战,本文提出了一种基于监督学习的解决方案,采用样本平衡技术,结合合成少数类过采样技术(synthetic minority over-sampling technique,SMOTE)与编辑最近邻(edited nearest neighbor,ENN)算法,对局部加权学习(local weight learning,LWL)、J48、C4.8、随机森林、贝叶斯网络(bayes net,BN)、多层前馈神经网络(multilayer feedforward neural network,MFNN)、支持向量机(supported-vector-machine,SVM)以及朴素贝叶斯(naive-bayse key,NB-K)等多种算法进行测试。这些算法被应用于NASA数据库的3个不同数据集,并对其效果进行详细比较分析。研究结果显示,结合了SMOTE和ENN的随机森林模型在处理类别不平衡问题方面展现出高效且避免过拟合的优势,为解决软件缺陷预测中的类别不平衡问题提供了一种有效的解决方案。

    Abstract:

    As the number of software on the market increases dramatically, the importance of software quality gradually intensifies, making software testing an indispensable part of the software development process. With the growing demand for software testing, emerging technologies have been widely applied in testing, among which machine learning's predictive models and scalability have gradually become mainstream technologies for software defect prediction. However, in this context, software prediction faces a series of issues, especially the class imbalance problem and prediction accuracy issues. This paper proposes a supervised learning-based software prediction method targeting these two core problems. Specifically, the approach involves balancing the samples in the datasets (KK1, KK3, PK2) from the NASA database, using the SMOTE algorithm for over-sampling and the ENN algorithm for under-sampling. Then, the paper compares and analyzes the actual effects of these three datasets using various algorithms based on supervised learning, including Local Weighted Learning (LWL), J48, C4.8, Random Forest, Bayesian Belief Network, Multilayer Feedforward Neural Network, Support Vector Machine (SVM), and NB-K. The results indicate that the SMOTE+ENN+Random Forest model can effectively address the class imbalance problem, while other methods have certain limitations in comparison.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-20
  • 最后修改日期:2024-10-17
  • 录用日期:2024-10-28
  • 在线发布日期:
  • 出版日期:
文章二维码