Abstract:To improve the efficiency of organizing and retrieving safety hazard information and to support more complex information processing tasks, effective technical methods for automatic text classification and type analysis are required. Support Vector Machine (SVM) can automatically classify unstructured text. However, their underlying principle focuses on identifying optimal classification boundaries within the training set and does not facilitate the extraction of representative features for each text category. To address this limitation, a normalized entropy model is proposed to search for typical category features, thereby improving the traditional term frequency-inverse document frequency (TF-IDF) based feature recognition method. Using 2 534 law enforcement inspection records from a government emergency management bureau as a case study, SVM was used for automatic text classification and achieved an accuracy of up to 97%. Meanwhile, the normalized entropy model was used to extract representative features for each category, providing decision support for formulating targeted rectification strategies in hazard investigation. Experimental results show that the combined use of SVM and the normalized entropy model effectively addresses both text classification and category feature recognition tasks.