Stratified Over-sampling Bagging Method for Random Forests on Imbalanced Data
He Zhao; Xiaojun Chen; Tung Nguyen; Joshua Zhexue Huang; Graham Williams; Hui Chen
2016
会议名称PAKDD 2016, Intelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedings
会议地点新西兰
英文摘要Imbalanced data presents a big challenge to random forests(RF). Over-sampling is a commonly used sampling method for imbalanced data, which increases the number of instances of minority class to balance the class distribution. However, such method often produces sample data sets that are highly correlated if we only sample more minority class instances, thus reducing the generalizability of RF. To solve this problem, we propose a strati ed over-sampling (SOB) method to generate both balanced and diverse training data sets for RF. We rst cluster the training data set multiple times to produce multiple clustering results. The small individual clusters are grouped according to their entropies. Then we sample a set of training data sets from the groups of clusters using strati ed sampling method. Finally, these training data sets are used to train RF. The data sets sampled with SOB are guaranteed to be balanced and diverse, which improves the performance of RF on imbalanced data. We have conducted a series of experiments, and the experimental results have shown that the proposed method is more effective than some existing sampling methods.
收录类别EI
语种英语
内容类型会议论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/10306]  
专题深圳先进技术研究院_数字所
作者单位2016
推荐引用方式
GB/T 7714
He Zhao,Xiaojun Chen,Tung Nguyen,et al. Stratified Over-sampling Bagging Method for Random Forests on Imbalanced Data[C]. 见:PAKDD 2016, Intelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedings. 新西兰.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace