An improved random forest algorithm based on unbalanced data
Article
Figures
Metrics
Preview PDF
Reference
Related
Cited by
Materials
Abstract:
Random forest algorithm has better classification performance as a combination of classification and is suitable for a variety of classification environments, but it also has some flaws. For example, it can not distinguish positive and negative class when dealing with unbalanced data. By setting conditions on sampling results, we improve the Bootstrap sampling method, reduce the influence of sampling on non-equilibrium and ensure the randomness of this algorithm. Then, we weight every decision tree according to the non-equilibrium coefficient of the generated data to enhance the discourse right of the decision tree which is sensitive to the non-equilibrium data and improve the classification performance of the whole algorithm dealing with unbalanced data. With these two above improvements, the new algorithm can significantly improve classification performance when the number of decision tree is insufficient.