Abstract:Cycle mining can help people deeply understand the structure and function of complex networks, which is of great significance for practical application fields such as road traffic networks, bioprotein networks, financial and economic networks, etc. However, the massive data in the information age makes cycle mining extremely challenging. In response to the problem of large data volumes but relatively limited available data that cannot mine complete cycles, the concept of approximate cycle (AC) is defined, and the approximate cycle detection algorithm (ACD) and its optimization algorithm (IACD) are proposed. Both algorithms are divided into three stages: first, calculate hotpoints through vertex degree calculation; secondly, perform forward and backward searches on the dataset based on hotpoints to obtain hotpoints and their neighbors, and use this to construct an index (H-Index); finally, calculate the tightness coefficient and average tightness coefficient between different vertices based on H-Index, the path between vertex pairs with a tightness coefficient greater than the average tightness coefficient is an approximate cycle. The IACD algorithm has been optimized in two aspects based on the ACD algorithm. On the one hand, it increases the deduplication of vertices in the acquisition of hotpoints and their neighbors, while reducing the number of searches for data. On the other hand, it uses function vectorization instead of cyclic modification in the construction of indexes. The experimental data used are all real datasets of SNAP public website. The experimental results show that both algorithms can run smoothly on larger datasets and have good scalability and efficiency. The efficiency of the IACD algorithm is about 25% higher than that of the ACD algorithm.