基于SVM和归一化熵模型的隐患文本分类与类型特征分析
作者:
作者单位:

1.首都经济贸易大学 管理工程学院;2.北京邮电大学《北京邮电大学学报自然版》编辑部;3.北京邮电大学经济管理学院社会化网络信息研究中心

中图分类号:

X928

基金项目:

中国高校科技期刊研究会专项基金项目(项目编号:CUJS2024-GJ-A01)


The Classifications and Characterizations of Safety Hazard Texts
Author:
Affiliation:

1.School of Management Engineering,Capital University of Economics and Business;2.Editorial Department of Journal of Beijing University of Posts and Telecommunications Nature Edition,Beijing University of Posts and Telecommunications;3. Social Network Information Research Center, School of Economics and Management, Beijing University of Posts and Telecommunications

Fund Project:

Special Fund Project of China University Science and Technology Journal Research Association(CUJS2024-GJ-A01)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    为了提高隐患信息数据组织和检索的效率,以及支持更复杂的信息处理任务,需要采用有效技术手段对数据进行自动分类和类型分析。支持向量机(Support Vector Machine, SVM)可以对自由文本进行自动分类,但是算法的工作原理是在训练集中寻找最优分类边界,不能发现类型典型特征。为了分析类型样本的共同特征,提出采用归一化熵模型寻找类型典型特征,改进当前TFIDF(Term Frequency-Inverse Document Frequency)类型特征识别方法。以政府某应急管理局的2534条执法检查记录为例,采用SVM进行自动分类,准确率高达到97%。同时通过归一化熵模型给出各类型的典型特征,为制定隐患排查专项整治策略提供决策支持。实验结果表明采用SVM和归一化熵模型的组合技术可以高效解决文本分类和类型特征识别的综合问题。

    Abstract:

    To improve the efficiency of organizing and retrieving hazard information data and support more complex information processing tasks, effective technical methods need to be adopted for automatic data classification and type analysis. Support Vector Machine (SVM) can automatically classify free text. However, the working principle of the algorithm is to find the optimal classification boundary in the training set, and cannot discover typical type features. So, a normalized entropy model is proposed to search for typical type features, which improves the current TFIDF (Term Frequency Inverse Document Frequency) type feature recognition method. Taking 2534 law enforcement inspection records from a government emergency management bureau as an example, SVM was used for automatic classification, with an accuracy rate of up to 97%. At the same time, the normalized entropy model was used to provide typical characteristics of each type, providing decision support for formulating special rectification strategies for hazard investigation. The experimental results show that the combination of SVM and normalized entropy model can efficiently solve the comprehensive problem of text classification and type feature recognition.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-20
  • 最后修改日期:2025-02-19
  • 录用日期:2025-03-21
文章二维码