A duplicate bug report detection model with enhanced text relevance semantics and multi-feature extraction
CSTR:
Author:
Affiliation:

The Key Laboratory for Computer Systems of State Ethnic Affairs Commission, Southwest Minzu University, Chengdu 610041, P. R. China

Clc Number:

TP311.5

Fund Project:

Supported by National Natural Science Foundation of China(61502401, 12050410248), Sichuan Science and Technology Program(2021YFH0120), and Fundamental Research Funds for the Central Universities, Southwest Minzu University (2020YYXS59).

  • Article
  • | |
  • Metrics
  • |
  • Reference [19]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    A duplicate bug report detection model with enhanced text relevance semantics and multi-feature extraction was proposed to address the issues of semantic long-distance dependence and the singleness of bug report features in the current research on duplicate bug report detection. The model introduced the self-attention mechanism to capture the semantic relevance within the bug report text sequence. This mechanism calculates the contextual semantic vector dynamically for semantic analysis and resolves the problem of long-distance dependence. Additionally, the model employed the latent Dirichlet allocation algorithm to capture the topic characteristics of the bug report text. Furthermore, a feature extraction network was constructed to calculate category difference features, providing category information for the bug report simultaneously. Finally, comprehensive detection was performed based on three types of feature vectors. The experimental results demonstrate that the model achieves improved detection performance.

    Reference
    [1] Xie Q, Wen Z Y, Zhu J M, et al. Detecting duplicate bug reports with convolutional neural networks[C]//2018 25th Asia-Pacific Software Engineering Conference(APSEC). IEEE, 2018:416-425.
    [2] Zou W Q, Lo D, Chen Z Y, et al. How practitioners perceive automated bug report management techniques[J]. IEEE Transactions on Software Engineering, 2020, 46(8):836-862.
    [3] Alt?nel B, Ganiz M C. Semantic text classification: a survey of past and recent advances[J]. Information Processing & Management, 2018, 54(6):1129-1153.
    [4] Lin Z H, Feng M W, dos Santos C N, et al. A structured self-attentive sentence embedding[EB/OL]. 2017: arXiv: 1703.03130. https://arxiv.org/abs/1703.03130.
    [5] Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing[C]//29th International Conference on Software Engineering. IEEE, 2007: 499-510.
    [6] Sureka A, Jalote P. Detecting duplicate bug report using character N-gram-based features[C]//2010 Asia Pacific Software Engineering Conference. IEEE, 2011: 366-374.
    [7] Sun C N, Lo D, Khoo S C, et al. Towards more accurate retrieval of duplicate bug reports[C]//26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 2011:253-262.
    [8] Yang C Z, Du H H, Wu S S, et al. Duplication detection for software bug reports based on BM25 term weighting[C]//2012 Conference on Technologies and Applications of Artificial Intelligence. IEEE, 2013: 33-38.
    [9] Kukkar A, Mohana R, Kumar Y, et al. Duplicate bug report detection and classification system based on deep learning technique[J]. IEEE Access, 2020,8: 200749-200763.
    [10] He J J, Xu L, Yan M, et al. Duplicate bug report detection using dual-channel convolutional neural networks[C]//Proceedings of the 28th International Conference on Program Comprehension. New York: ACM, 2020: 117-127.
    [11] Deshmukh J, Annervaz K M, Podder S, et al. Towards accurate duplicate bug retrieval using deep learning techniques[C]//2017 IEEE International Conference on Software Maintenance and Evolution(ICSME). IEEE, 2017:115-124.
    [12] Prifti T, Banerjee S, Cukic B. Detecting bug duplicate reports through local references[C]//Proceedings of the 7th International Conference on Predictive Models in Software Engineering. IEEE, 2011:1-9.
    [13] Poddar L, Neves L, Brendel W, et al. Train one get one free: partially supervised neural network for bug report duplicate detection and clustering[C]//Proceedings of the 2019 Conference of the North. Minneapolis-Minnesota. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 157-165.
    [14] Rocha T M, Da Costa Carvalho A L. SiameseQAT: a semantic context-based duplicate bug report detection using replicated cluster information[J]. IEEE Access, 2021, 9: 44610-44630.
    [15] Pennington J, Socher R, Manning C. Glove: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2014: 1532-15.
    [16] Chai H M, Lei J N, Fang M. Estimating Bayesian networks parameters using EM and Gibbs sampling[J]. Procedia Computer Science, 2017, 111:160-166.
    [17] Lazar A, Ritchey S, Sharif B. Generating duplicate bug datasets[C]//Proceedings of the 11th Working Conference on Mining Software Repositories. New York: ACM, 2014:392-395.
    [18] Budhiraja A, Dutta K, Reddy R, et al. DWEN: deep word embedding network for duplicate bug report detection in software repositories[C]//Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. New York: ACM, 2018: 193-194.
    [19] 范道远, 孙吉红, 王炜, 等. 融合文本与分类信息的重复缺陷报告检测方法[J]. 计算机科学, 2019, 46(12): 192-200.Fan D Y, Sun J H, Wang W, et al. Detection method of duplicate defect reports fusing text and categorization information[J]. Computer Science, 2019, 46(12): 192-200.(in Chinese)
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

周文杰,谢琪,崔梦天.强化文本关联语义和多特征提取的重复缺陷报告检测模型[J].重庆大学学报,2023,46(7):53~62

Copy
Share
Article Metrics
  • Abstract:340
  • PDF: 517
  • HTML: 79
  • Cited by: 0
History
  • Received:May 31,2021
  • Online: August 02,2023
Article QR Code