基于SVM和归一化熵模型的隐患文本分类与类型特征分析
CSTR:
作者:
作者单位:

1.首都经济贸易大学 管理工程学院,北京 100070;2.北京邮电大学,《北京邮电大学学报(自然版)》编辑部,北京 100876;3.北京邮电大学,经济管理学院社会化网络信息研究中心,,北京 100876

作者简介:

乔剑锋(1977—),男,副教授,博士,主要从事安全数据挖掘以及安全风险预警和评价方向研究,(E-mail) qiaojianfeng@cueb.edu.cn。

通讯作者:

中图分类号:

基金项目:

中国高校科技期刊研究会专项基金项目(CUJS2024-GJ-A01)。


Classifications and characterization of safety hazard texts
Author:
Affiliation:

1.School of Management Engineering, Capital University of Economics and Business, Beijing 100070, P. R. China;2.Editorial Department of Journal of Beijing University of Posts and Telecommunications (Nature Edition);3.Social Network Information Research Center, School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, P. R. China

Fund Project:

Supported by the Special Fund Project of the Society of China University Journals (CUJS2024-GJ-A01).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了提高隐患信息数据组织和检索的效率,支持更复杂的信息处理任务,需要采用有效技术手段对数据进行自动分类和类型分析。支持向量机(support vector machine,SVM)可以对自由文本进行自动分类,但是算法的工作原理是在训练集中寻找最优分类边界,不能发现类型典型特征。为了分析类型样本的共同特征,提出采用归一化熵模型寻找类型典型特征,改进当前词频-逆文档频率(term frequency-inverse document frequency,TF-IDF)类型特征识别方法。以政府某应急管理局的2 534条执法检查记录为例,采用SVM进行自动分类,准确率高达97%。同时通过归一化熵模型给出各类型的典型特征,为制定隐患排查专项整治策略提供决策支持。实验结果表明,采用SVM和归一化熵模型的组合技术可以高效解决文本分类和类型特征识别的综合问题。

    Abstract:

    To improve the efficiency of organizing and retrieving safety hazard information and to support more complex information processing tasks, effective technical methods for automatic text classification and type analysis are required. Support Vector Machine (SVM) can automatically classify unstructured text. However, their underlying principle focuses on identifying optimal classification boundaries within the training set and does not facilitate the extraction of representative features for each text category. To address this limitation, a normalized entropy model is proposed to search for typical category features, thereby improving the traditional term frequency-inverse document frequency (TF-IDF) based feature recognition method. Using 2 534 law enforcement inspection records from a government emergency management bureau as a case study, SVM was used for automatic text classification and achieved an accuracy of up to 97%. Meanwhile, the normalized entropy model was used to extract representative features for each category, providing decision support for formulating targeted rectification strategies in hazard investigation. Experimental results show that the combined use of SVM and the normalized entropy model effectively addresses both text classification and category feature recognition tasks.

    参考文献
    相似文献
    引证文献
引用本文

乔剑锋,刘萱,艾莉莎,张丽玮,王汀.基于SVM和归一化熵模型的隐患文本分类与类型特征分析[J].重庆大学学报,2026,49(2):105-115.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-02-03
  • 出版日期:
文章二维码