Text knowledge management oriented adaptive Chinese word segmentation algorithms

doi:10.11835/j.issn.1000-582X.2010.10.019

Home > Archive>Volume 33, Issue 10, 2010 >110-117. DOI:10.11835/j.issn.1000-582X.2010.10.019

Text knowledge management oriented adaptive Chinese word segmentation algorithms
DOI:
                        10.11835/j.issn.1000-582X.2010.10.019
                    
CSTR:
                        [cstr]
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To overcome the shortcomings of new word recognition and special word processing for the traditional dictionary-based matching algorithm in,text knowledge management oriented adaptive Chinese word segmentation algorithm (SACWSA) based on 2-gram statistical model is presented..At the preprocessing stage,SACWSA applies finite state machine theory,conjunction-based partition method and divide conquer strategy to partition long sentences in input text into sub-sentences,which reduces the algorithm complexity effectively.At the word segmentation stage,2-gram statistical model is employed and combined with partial probability and overall probability to partition the sub-sentences into words,which improved the recognition rate of new words and eliminated ambiguity.At the post-processing stage,the matching rules of part-of-speech are established to eliminate ambiguity of 2-gram word segmentation results further.The innovations of SACWSA include dealing with the long sentences and long terms with the idea of ’Divide and Conquer’; while combining the partial probability and overall probability to identify new words and eliminate ambiguity.Experimental results on text corpus of different fields show that SACWSA can adapt to different text knowledge management requirements in different fields accurately,efficiently and automatically.

Reference

Cited by

Get Citation

冯永,贺迅,唐黎,陈显勇,陈贞.面向文本知识管理的自适应中文分词算法[J].重庆大学学报,2010,33(10):110~117

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 10,2009
Revised:
Adopted:
Online:
Published:

Home

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code