一种处理不均衡多分类问题的特征选择集成方法
作者:
中图分类号:

TP181

基金项目:

教育部-新华三集团"云数融合"基金资助项目(2017A13055)。


An ensemble learning algorithm for feature selection based on solution to multi-class imbalance data classification
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [25]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    为解决不均衡多分类问题,提出一种特征选择和AdaBoost的集成方法。首先,数据进行预处理。利用WSPSO算法进行特征选择,根据特征重要性选取初始粒子构建初始种群,使得算法初期就可以沿着正确的搜索方向开展,减少不相关特征的影响。其次,利用AdaBoost算法对于样本权重较敏感的特点,增强对小类样本的关注度。并且利用AUCarea作为评价标准,相对于其他评价标准,AUCarea具有可视化的优点且对较差AUC更加敏感。最后,与其他几种不均衡分类算法在不平衡数据集上进行对比,结果证明该算法可有效处理不均衡多分类问题。

    Abstract:

    In order to solve the problem of unbalanced multi-classification, a feature selection and AdaBoost integration method is proposed. First, the data is preprocessed. The WSPSO algorithm is used to select features, and the initial population is constructed according to the importance of the feature. The initial algorithm can be carried out along the correct search direction to reduce the influence of incoherent features. Secondly, the AdaBoost algorithm is more sensitive to sample weights, and the attention to small samples is enhanced. And using AUCare is used, as the evaluation standard, because compared with other evaluation criteria, AUCare has the advantage of visualization and is more sensitive to poor AUC. Finally, compared with several other unbalanced classification algorithms on the unbalanced data set, the algorithm can effectively deal with the unbalanced multi-classification problem.

    参考文献
    [1] Napierala K, Stefanowski J. Types of minority class examples and their influence on learning classifiers from imbalanced data[J]. Journal of Intelligent Information Systems, 2016, 46(3):563-597.
    [2] Glauner P, Boechat A, Dolberg L, et al. Large-scale detection of non-technical losses in imbalanced data sets[C]//2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference. September 6-9, 2016, Minneapolis, MN, USA. IEEE, 2016:1-5.
    [3] Chawla N V, Lazarevic A, Hall L O, et al. SMOTEBoost:Improving prediction of the minority class in boosting[C]//European conference on principles of data mining and knowledge discovery. Berlin, Heidelberg:Springer, 2003:107-119.
    [4] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16:321-357.
    [5] 武森, 刘露, 卢丹. 基于聚类欠采样的集成不均衡数据分类算法[J]. 工程科学学报, 2017, 39(8):1244-1253.Wu S, Liu L, Lu D. Imbalanced data ensemble classification based on cluster-based under-sampling algorithm[J]. Chinese Journal of Engineering, 2017, 39(8):1244-1253. (in Chinese)
    [6] Krawczyk B, Schaefer G. An improved ensemble approach for imbalanced classification problems[C]//2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics. May 23-25, 2013. Timisoara, Romania:IEEE, 2013:423-426.
    [7] Krawczyk B, Woz[DD(-1mm]'niak M, Herrera F. Weighted one-class classification for different types of minority class examples in imbalanced data[C]//2014 IEEE Symposium on Computational Intelligence and Data Mining. December 9-12, 2014. Orlando, FL, USA:IEEE, 2014:337-344.
    [8] 王莉莉, 付忠良, 陶攀, 等. 基于主动学习不平衡多分类AdaBoost算法的心脏病分类[J]. 计算机应用, 2017, 37(7):1994-1998. Wang L L, Fu Z L, Tao P, et al. Heart disease classification based on active imbalance multi-class Ada Boost algorithm[J]. Journal of Computer Applications, 2017, 37(7):1994-1998. (in Chinese)
    [9] Tao X M, Li Q, Guo W J, et al. Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification[J]. Information Sciences, 2019, 487:31-56.
    [10] 陶新民, 郝思媛, 张冬雪, 等. 不均衡数据分类算法的综述[J]. 重庆邮电大学学报(自然科学版), 2013, 25(1):101-110, 121.Tao X M, Hao S Y, Zhang D X, et al. Overview of classification algorithms for unbalanced data[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2013, 25(1):101-110, 121. (in Chinese)
    [11] Dietterich T G. Ensemble methods in machine learning[C]//International workshop on multiple classifier systems. Berlin, Heidelberg:Springer, 2000:1-15.
    [12] Tao X M, Li Q, Ren C, et al. Real-value negative selection over-sampling for imbalanced data set learning[J]. Expert Systems With Applications, 2019, 129:118-134.
    [13] Tao X M, Li Q, Ren C, et al. Affinity and class probability-based fuzzy support vector machine for imbalanced data sets[J]. Neural Networks, 2020, 122:289-307.
    [14] 张苗燕, 王登飞, 魏宗寿. 一种改进的AdaBoost快速训练算法[J]. 西北工业大学学报, 2017, 35(6):1119-1124.Zhang M Y, Wang D F, Wei Z S. An improved Ada boost training algorithm[J]. Journal of Northwestern Polytechnical University, 2017, 35(6):1119-1124. (in Chinese)
    [15] Guo H X, Li Y J, Li Y N, et al. BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification[J]. Engineering Applications of Artificial Intelligence, 2016, 49:176-193.
    [16] 胡旺, 李志蜀. 一种更简化而高效的粒子群优化算法[J]. 软件学报, 2007, 18(4):861-868.Hu W, Li Z S. A simpler and more effective particle swarm optimization algorithm[J]. Journal of Software, 2007, 18(4):861-868. (in Chinese)
    [17] Hastie T, Rosset S, Zhu J, et al. Multi-class AdaBoost[J]. Statistics and Its Interface, 2009, 2(3):349-360.
    [18] Nakas C T, Yiannoutsos C T. Ordered multiple-class ROC analysis with continuous measurements[J]. Statistics in Medicine, 2004, 23(22):3437-3449.
    [19] Hand D J, Till R J. A simple generalisation of the area under the ROC curve for multiple class classification problems[J]. Machine Learning, 2001, 45(2):171-186.
    [20] Qu Y, Fang Y, Yan F Q. Feature selection algorithm based on association rules[J]. Journal of Physics:Conference Series, 2019, 1168:052012.
    [21] Bratton D, Kennedy J. Defining a standard for particle swarm optimization[C]//2007 IEEE Swarm Intelligence Symposium. April 1-5, 2007. Honolulu, HI, USA:IEEE, 2007:120-127.
    [22] 行鸿彦, 郭敏, 张兰, 等. 基于改进SPSO-BP神经网络的温度传感器湿度补偿[J]. 传感技术学报, 2018, 31(3):380-385.Xing H Y, Guo M, Zhang L, et al. The humidity compensation for temperature sensor based on improved SPSO-BP neural network[J]. Chinese Journal of Sensors and Actuators, 2018, 31(3):380-385. (in Chinese)
    [23] Kennedy J, Eberhart R C. A discrete binary version of the particle swarm algorithm[C]//1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation. October 12-15, 1997. Orlando, FL, USA:IEEE, 1997:4104-4108.
    [24] Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140.
    [25] Galar M, Fernández A, Barrenechea E, et al. EUSBoost:Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J]. Pattern Recognition, 2013, 46(12):3460-3471.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

宿晨,徐华,崔鑫,王玲娣.一种处理不均衡多分类问题的特征选择集成方法[J].重庆大学学报,2022,45(5):125-134.

复制
分享
文章指标
  • 点击次数:392
  • 下载次数: 619
  • HTML阅读次数: 857
  • 引用次数: 0
历史
  • 收稿日期:2020-12-25
  • 最后修改日期:2020-12-23
  • 在线发布日期: 2022-06-11
文章二维码