面向大图的近似环挖掘算法
作者:
作者单位:

河南财经政法大学 计算机与信息工程学院

中图分类号:

TP399

基金项目:

国家自然科学基金资助项目(61702161);河南省高等学校重点科研项目(24A520001);河南省重点研发与推广专项(科技攻关)(212102210386).


Approximate Cycle Mining Algorithm for Large Graphs
Author:
Affiliation:

School of Computer and Information Engineering,Henan University of Economics and Law

Fund Project:

the National Natural Science Foundation of China

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [33]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    环路挖掘可以帮助人们深入理解复杂网络的结构和功能,这对诸如道路交通网络、生物蛋白网络、金融经济网络等实际应用领域都具有十分重要的意义.然而,信息时代的海量数据,使得环路挖掘变得极具挑战.针对数据量大但因可用数据相对有限而无法挖掘出完整环路的问题,定义了近似环(Approximate Cycle, AC)的概念,同时提出近似环检测算法(Approximate Cycle Detection, ACD)及其优化算法(Improved Approximate Cycle Detection, IACD).两个算法都分为三阶段:首先,通过节点度计算得到热点(Hotpoints);其次,依据热点对数据集进行前向、后向搜索得到热点和其邻居,并以此创建索引(H-Index);最后,根据索引计算不同节点间的紧密系数和平均紧密系数,紧密系数大于平均紧密系数的节点对之间的路径即为近似环.IACD算法在ACD算法的基础上进行了两方面的优化,一方面在获取热点和其邻居部分增加对节点的去重,同时减少对数据的搜索次数,另一方面,在创建索引部分用函数矢量化处理代替循环修改操作.实验所用数据均为SNAP公开网站的真实数据集.实验结果表明,两种算法都能在较大的数据集上顺利运行,具有良好的扩展性和高效性.IACD算法效率比ACD算法提升25%左右.

    Abstract:

    Cycle mining can help people deeply understand the structure and function of complex networks, which is of great significance for practical application fields such as road traffic networks, bioprotein networks, financial and economic networks, etc. However, the massive data in the information age makes cycle mining extremely challenging. In response to the problem of large data volumes but relatively limited available data that cannot mine complete cycles, the concept of approximate cycle (AC) is defined, and the approximate cycle detection algorithm (ACD) and its optimization algorithm (IACD) are proposed. Both algorithms are divided into three stages: first, calculate hotpoints through vertex degree calculation; secondly, perform forward and backward searches on the dataset based on hotpoints to obtain hotpoints and their neighbors, and use this to construct an index (H-Index); finally, calculate the tightness coefficient and average tightness coefficient between different vertices based on H-Index, the path between vertex pairs with a tightness coefficient greater than the average tightness coefficient is an approximate cycle. The IACD algorithm has been optimized in two aspects based on the ACD algorithm. On the one hand, it increases the deduplication of vertices in the acquisition of hotpoints and their neighbors, while reducing the number of searches for data. On the other hand, it uses function vectorization instead of cyclic modification in the construction of indexes. The experimental data used are all real datasets of SNAP public website. The experimental results show that both algorithms can run smoothly on larger datasets and have good scalability and efficiency. The efficiency of the IACD algorithm is about 25% higher than that of the ACD algorithm.

    参考文献
    [1] Kumar R, Calders T. 2scent: An efficient algorithm to enumerate all simple temporal cycles[J]. Proc of the VLDB Endowment, 2018, 11(11): 1441-1453.
    [2] Kumar R, Calders T. Finding simple temporal cycles in an interaction network[C]//Proc of the Workshop on Large-Scale Time Dependent Graphs, Co-Located with the European Conf on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Skopje: Macedonia, 2017: 3-6.
    [3] Qiu X F, Cen W B, Qian Z P, et al. Real-time constrained cycle detection in large dynamic graphs[J]. Proc of the VLDB Endowment, 2018, 11(12): 876-1888.
    [4] 潘敏佳, 李荣华, 赵宇海, 等. 面向时序图数据的快速环枚举算法[J]. 软件学报, 2020, 31(12): 3823-3835.Pan M J, Li R H, Zhao Y H, et al. Fast temporal cycle enumeration algorithm on temporal graphs[J]. Journal of Software, 2020, 31(12): 3823-3835.
    [5] Rusek K, Suárez-Varela J, Almasan, P, et al. Routenet: Leveraging graph neural networks for network modeling and optimization in SDN[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(10): 2260-2270.
    [6] Xie Z Y, Huang Y H, Fang G Q, et al. RouteNet: Routability prediction for Mixed-Size Designs Using Convolutional Neural Network[C]//Proc of 2018 IEEE/ACM International Conference on Computer-Aided Design. San Diego: CA, 2018, 1-8.
    [7] Logan R W, Mcclung C A. Rhythms of life: circadian disruption and brain disorders across the lifespan[J]. Nature Reviews Neuroscience, 2019, 20(1): 49-65.
    [8] Zhuang Y, Li Z, Xiong S, et al. Circadian clocks are modulated by compartmentalized oscillating translation[J]. Cell 2023, 186(15): 3245-3260.
    [9] Altman E, Egressy B, Blanu?a J, et al. Realistic Synthetic Financial Transactions for Anti-Money Laundering Models[J]. arXiv preprint, 2023, arXiv: 2306.16424.
    [10] Wen D, Huang Y L, Zhang Y, et al. Efficiently answering span-reachability queries in large temporal graphs[C]//Proc of the 36th IEEE International Conference on Data Engineering. New York: IEEE, 2020: 1153-1164.
    [11] Rao V V B, Murti V G K. Enumeration of all circuits of a graph[J]. Proc of the IEEE, 1969, 57(4): 700-701.
    [12] Ponstein J. Self-avoiding paths and the adjacency matrix of a graph[J]. SIAM Journal on Applied Mathematics, 1966, 14(3): 600-609.
    [13] Mateti P, Deo N. On algorithms for enumerating all circuits of a graph[J]. SIAM Journal on Computing, 1976, 5(1): 90-99.
    [14] Tiernan J C. An efficient search algorithm to find the elementary circuits of a graph[J]. Communications of the ACM,1970, 13(12): 722-726.
    [15] Weinblatt H. A new search algorithm for finding the simple cycles of a finite directed graph[J]. Journal of the ACM, 1972, 19(1): 43-56.
    [16] Agrawal R, Borgida A, Jagadish H V. Efficient management of transitive relationships in large data and knowledge bases[J]. ACM SIGMOD Record, 1989, 18(2): 253-262.
    [17] Wei H, Yu J X, Lu Can, et al. Reachability querying: an independent permutation labeling approach[J]. The VLDB Journal, 2018, 27: 1-26.
    [18] Seufert S, Anand A, Bedathur S, et al. Ferrari: Flexible and efficient reachability range assignment for graph indexing[C]//Proc of the 29th IEEE International Conference on Data Engineering. Piscataway, NJ: IEEE, 2013: 1009-1020.
    [19] Cohen E, Halperin E, Kaplan H, et al. Reachability and distance queries via 2-hop labels[J]. SIAM Journal on Computing, 2003, 32(5): 1338-1355.
    [20] Cai J, Poon C K. Path-hop: efficiently indexing large graphs for reachability queries[C]//Proc of the 19th ACM international conference on Information and knowledge management. New York: ACM, 2010: 119-128.
    [21] Lyu Q y, Li Y C, He B, et al. DBL: Efficient Reachability Queries on Dynamic Graphs (Complete Version)[J]. arXiv preprint, 2021, arXiv: 2101.09441.
    [22] Jin R M, Xiang Y, Ruan N, et al. 3-hop: a high-compression indexing scheme for reachability query[C]//Proc of the 2009 ACM SIGMOD International Conference on Management of data. New York: ACM, 2009: 813-826.
    [23] Zhu L H, Choi B, He B, et al. A uniform framework for ad-hoc indexes to answer reachability queries on large graphs[C]//Proc of the 14th Springer Berlin Heidelberg, 2009: 138-152.
    [24] Jin R M, Ruan N, Dey S, et al. Scarab: scaling reachability computation on large graphs[C]//Proc of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012: 169-180.
    [25] Wang Y, Wang Q, Koehler Hs, et al. Query-by-sketch: Scaling shortest path graph queries on very large networks[C]//Proc of the 2021 International Conference on Management of Data. New York: ICMD, 2021: 1946-1958.
    [26] Sengupta N, Bagchi A, Ramanath M, et al. Arrow: Approximating reachability using random walks over web-scale graphs[C]//Proc of the 2019 IEEE 35th International Conference on Data Engineering. Piscataway, NJ: IEEE, 2019: 470-481.
    [27] Yu J X, Cheng J F. Graph reachability queries: A survey[J]. Managing and Mining Graph Data, 2010, 40: 181-215.
    [28] 富丽贞, 孟小峰. 大规模图数据可达性索引技术:现状与展望[J]. 计算机研究与发展, 2015, 52(1): 116-129.Fu L Z, Meng X F. Reachability Indexing for Large-Scale Graphs: Studies and Forecasts[J]. Journal of Computer Research and Development, 2015, 52(1): 116-129.
    [29] Juola P. JGAAP3-Authorship attribution for the rest of us[C]//Proc of Digital Humanities 2008: Book of Abstracts. Oulu: University of Oulu, 2008: 250-251.
    [30] Sahu S, Mhedhbi A, Salihoglu S, et al. The ubiquity of large graphs and surprising challenges of graph processing: extended survey[J]. The VLDB journal, 2020, 29: 595-618.
    [31] 陈子阳, 陈伟, 李娜, 等. 面向大图的可达性查询处理算法[J]. 计算机学报, 2019, 42(03): 582-595.Chen Z Y, Chen W, Li N, et al. Efficient processing algorithm for reachability queries based on big graph [J]. Journal of Computers, 2019, 42(03): 582-595.
    [32] 严蔚敏, 吴伟民. 数据结构[M]. 北京: 清华大学, 2018.Yan W M, Wu W M. data structuer[M]. Bei Jing: Qinghua University, 2018.
    [33] Leskovec J, Krevl A. SNAP datasets: Stanford large network dataset collection. 2014. https://snap.stanford.edu/data.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文
分享
文章指标
  • 点击次数:286
  • 下载次数: 0
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2023-12-25
  • 最后修改日期:2024-04-26
  • 录用日期:2024-05-13
文章二维码