An improved random forest algorithm based on unbalanced data
CSTR:
Author:
Clc Number:

TP391.4

  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Random forest algorithm has better classification performance as a combination of classification and is suitable for a variety of classification environments, but it also has some flaws. For example, it can not distinguish positive and negative class when dealing with unbalanced data. By setting conditions on sampling results, we improve the Bootstrap sampling method, reduce the influence of sampling on non-equilibrium and ensure the randomness of this algorithm. Then, we weight every decision tree according to the non-equilibrium coefficient of the generated data to enhance the discourse right of the decision tree which is sensitive to the non-equilibrium data and improve the classification performance of the whole algorithm dealing with unbalanced data. With these two above improvements, the new algorithm can significantly improve classification performance when the number of decision tree is insufficient.

    Reference
    Related
    Cited by
Get Citation

魏正韬,杨有龙,白婧.基于非平衡数据的随机森林分类算法改进[J].重庆大学学报,2018,41(4):54~62

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 20,2017
  • Online: May 06,2018
Article QR Code