[关键词]
[摘要]
环路挖掘可以帮助人们深入理解复杂网络的结构和功能,这对诸如道路交通网络、生物蛋白网络、金融经济网络等实际应用领域都具有十分重要的意义.然而,信息时代的海量数据,使得环路挖掘变得极具挑战.针对数据量大但因可用数据相对有限而无法挖掘出完整环路的问题,定义了近似环(Approximate Cycle, AC)的概念,同时提出近似环检测算法(Approximate Cycle Detection, ACD)及其优化算法(Improved Approximate Cycle Detection, IACD).两个算法都分为三阶段:首先,通过节点度计算得到热点(Hotpoints);其次,依据热点对数据集进行前向、后向搜索得到热点和其邻居,并以此创建索引(H-Index);最后,根据索引计算不同节点间的紧密系数和平均紧密系数,紧密系数大于平均紧密系数的节点对之间的路径即为近似环.IACD算法在ACD算法的基础上进行了两方面的优化,一方面在获取热点和其邻居部分增加对节点的去重,同时减少对数据的搜索次数,另一方面,在创建索引部分用函数矢量化处理代替循环修改操作.实验所用数据均为SNAP公开网站的真实数据集.实验结果表明,两种算法都能在较大的数据集上顺利运行,具有良好的扩展性和高效性.IACD算法效率比ACD算法提升25%左右.
[Key word]
[Abstract]
Cycle mining can help people deeply understand the structure and function of complex networks, which is of great significance for practical application fields such as road traffic networks, bioprotein networks, financial and economic networks, etc. However, the massive data in the information age makes cycle mining extremely challenging. In response to the problem of large data volumes but relatively limited available data that cannot mine complete cycles, the concept of approximate cycle (AC) is defined, and the approximate cycle detection algorithm (ACD) and its optimization algorithm (IACD) are proposed. Both algorithms are divided into three stages: first, calculate hotpoints through vertex degree calculation; secondly, perform forward and backward searches on the dataset based on hotpoints to obtain hotpoints and their neighbors, and use this to construct an index (H-Index); finally, calculate the tightness coefficient and average tightness coefficient between different vertices based on H-Index, the path between vertex pairs with a tightness coefficient greater than the average tightness coefficient is an approximate cycle. The IACD algorithm has been optimized in two aspects based on the ACD algorithm. On the one hand, it increases the deduplication of vertices in the acquisition of hotpoints and their neighbors, while reducing the number of searches for data. On the other hand, it uses function vectorization instead of cyclic modification in the construction of indexes. The experimental data used are all real datasets of SNAP public website. The experimental results show that both algorithms can run smoothly on larger datasets and have good scalability and efficiency. The efficiency of the IACD algorithm is about 25% higher than that of the ACD algorithm.
[中图分类号]
TP399
[基金项目]
国家自然科学基金资助项目(61702161);河南省高等学校重点科研项目(24A520001);河南省重点研发与推广专项(科技攻关)(212102210386).