协方差测距算法在多维聚类分析中的优化研究
作者:
作者单位:

昆明理工大学 信息工程与自动化学院,昆明650500

作者简介:

刘云(1973—),男,副教授,主要从事数据挖掘分析、人工智能方向研究,(E-mail)liuyun@kmust.edu.cn。

基金项目:

国家自然科学基金资助项目(61761025);云南省重大科技专项计划资助项目(202002AD080002)。


Optimization of covariance distance measurement algorithm for multidimensional clustering analysis
Author:
Affiliation:

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China

Fund Project:

Supported by National Natural Science Foundation of China(61761025) and Major Science and Technology Project of Yunnan Province(202002AD080002).

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    为了在多维聚类分析中运用有效距离度量方法表征数据对象的邻近度,提出一种协方差测距(covariance distance measure analysis ,CDM)算法,首先,采用模糊C均值(fuzzy c-means ,FCM)方法对数据对象赋予权值,得到每个样本点相对类别特征的隶属度,再依据隶属度计算每个样本的差异度;其次,为了使类别分离最大化,用样本点同关联类别的协方差距离度量代替模糊聚类中欧式距离度量作为优化问题的第一个标准,使相似数据对象更为接近;最后,用样本点间的协方差距离度量作为第二个优化标准,使相异数据相互隔开,交替固定变量迭代计算最优解,使聚类指标和距离度量学习参数同时得到优化,获得更好的聚类结果。在不同数据集上的实验结果表明,与FCM-Sig和UNCA算法相比,CDM算法在聚类准确性和算法收敛性方面均有更好表现。

    Abstract:

    In order to use effective distance measurement methods to characterize the proximity of data objects in multi-dimensional clustering analysis, a covariance distance measurement (CDM) algorithm is proposed. First, fuzzy C-means (FCM) is used to assign weights to the data objects, so that the membership degree of each sample point relative to the category feature is obtained. Based on the membership degree, the difference degree of each sample is calculated. Then, as the first optimization criterion, the variance distance measure is used to replace the Euclidean distance measure in fuzzy clustering to make similar data objects closer. Finally, the covariance distance measure between the sample points is used as the second optimization criterion to make the different data objects separate from each other. The optimal solution is calculated iteratively with alternate fixed variables, so that the clustering index and distance measurement learning parameters are optimized at the same time, and better clustering results are obtained. Experimental results on different data sets show that compared with FCM-Sig and UNCA algorithms, CDM algorithm has better performance in clustering accuracy and algorithm convergence.

    参考文献
    [1] Ahmed I, Dagnino A, Ding Y. Unsupervised anomaly detection based on minimum spanning tree approximated distance measures and its application to hydropower turbines[J]. IEEE Transactions on Automation Science and Engineering, 2019, 16(2): 654-667.
    [2] Zhu X B, Pedrycz W, Li Z W. Fuzzy clustering with nonlinearly transformed data[J]. Applied Soft Computing, 2017, 61: 364-376.
    [3] Wei L Y. A hybrid ANFIS model based on empirical mode decomposition for stock time series forecasting[J]. Applied Soft Computing, 2016, 42: 368-376.
    [4] Qin C, Song S J, Huang G, et al. Unsupervised neighborhood component analysis for clustering[J]. Neurocomputing, 2015, 168: 609-617.
    [5] 李鹏华, 刘晶晶, 冯辉宗, 等. 改进测度下的模糊C均值三元催化器故障诊断方法[J]. 重庆大学学报, 2018, 41(1): 88-98.Li P H, Liu J J, Feng H Z, et al. Fault diagnosis of three-way catalytic converter using improved fuzzy C-means clustering[J]. Journal of Chongqing University, 2018, 41(1): 88-98.(in Chinese)
    [6] Li P H, Liu J, Feng H, et al. Fault diagnosis of three-way catalytic converter using improved fuzzy C-means clustering[J]. Chongqing Daxue Xuebao/Journal of Chongqing University, 2018, 41(1): 88-98.
    [7] Bai Z X, Zhang X L, Chen J D. Speaker verification by partial AUC optimization with mahalanobis distance metric learning[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1533-1548.
    [8] Cardarilli G C, Di Nunzio L, Fazzolari R, et al. $N$-dimensional approximation of euclidean distance[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(3): 565-569.
    [9] Xue B, Zhang L H, Yu Y, et al. Locating the nodes from incomplete euclidean distance matrix using Bayesian learning[J]. IEEE Access, 2019, 7: 37406-37413.
    [10] Chang Z P, Chen W H, Gu Y P, et al. Mahalanobis-taguchi system for symbolic interval data based on kernel mahalanobis distance[J]. IEEE Access, 2020, 8: 20428-20438.
    [11] dos Santos T R L, Zárate L E. Categorical data clustering: what similarity measure to recommend?[J]. Expert Systems With Applications, 2015, 42(3): 1247-1260.
    [12] Hou C P, Nie F P, Yi D Y, et al. Discriminative embedded clustering: a framework for grouping high-dimensional data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1287-1299.
    [13] 刘洲洲, 李彬. 基于动态多子族群自适应群居蜘蛛优化算法[J]. 四川大学学报(自然科学版), 2017, 54(4): 721-727.Liu Z Z, Li B. An adaptation social spider optimization algorithm based on dynamic multi-swarm strategy[J]. Journal of Sichuan University (Natural Science Edition), 2017, 54(4): 721-727.(in Chinese)
    [14] Sun Y N, Yen G G, 0001 Z Y. IGD indicator-based evolutionary algorithm for many-objective optimization problems[J]. IEEE Trans Evolutionary Computation, 2019, 23(2): 173-187.
    [15] Zhao X W, Liang J Y, Dang C Y. Clustering ensemble selection for categorical data based on internal validity indices[J]. Pattern Recognition, 2017, 69: 150-168.
    [16] Pakazad S K, Hansson A, Andersen M S, et al. Distributed semidefinite programming with application to large-scale system analysis[J]. IEEE Transactions on Automatic Control, 2018, 63(4): 1045-1058.
    [17] BacheK, LichmanM. UCI machine learning repository,2013,[EB/OL],Available:http://archive.ics.uci.edu/ml
    [18] Li P H, Zhang Z J, Xiong Q Y, et al. State-of-health estimation and remaining useful life prediction for the lithium-ion battery based on a variant long short term memory neural network[J]. Journal of Power Sources, 2020, 459(C): 228069.
    [19] 余萍, 曹洁. 深度学习在故障诊断与预测中的应用[J]. 计算机工程与应用, 2020, 56(3): 1-18.Yu P, Cao J. Deep learning approach and its application in fault diagnosis and prognosis[J]. Computer Engineering and Applications, 2020, 56(3): 1-18.(in Chinese)
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

刘云,张轶,郑文凤.协方差测距算法在多维聚类分析中的优化研究[J].重庆大学学报,2023,46(5):102-110.

复制
分享
文章指标
  • 点击次数:242
  • 下载次数: 514
  • HTML阅读次数: 125
  • 引用次数: 0
历史
  • 收稿日期:2022-06-09
  • 在线发布日期: 2023-05-31
文章二维码