A New Methal Unsupervised Feature Selection for Text Mining
Article
Figures
Metrics
Preview PDF
Reference
Related
Cited by
Materials
Abstract:
A novel approach for unsupervised feature selection is presented, denoted by DFFS, which combines Document Frequency and Feature Similarity. This method removeds ninety percent words based on document frequency, then removeds the redundancy features according to feature similarity. K-mean approach is used to measure the superiority of DFFS to the other common used feature selection methods, such as DF, TC and TS. In the first experiment, the clustering performance of DF is decreased sharply when the feature number decreased from 6000 to 1047, where DFFS keeping or increasing the clustering performance. In another experiment, with the feature number raimining at 10% - 2%, DFFS is superiority to the other three approaches, and is apparently superiority to others with 2% ramianing features.