Stratified Over-sampling Bagging Method for Random Forests on Imbalanced Data | |
He Zhao; Xiaojun Chen; Tung Nguyen; Joshua Zhexue Huang; Graham Williams; Hui Chen | |
2016 | |
会议名称 | PAKDD 2016, Intelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedings |
会议地点 | 新西兰 |
英文摘要 | Imbalanced data presents a big challenge to random forests(RF). Over-sampling is a commonly used sampling method for imbalanced data, which increases the number of instances of minority class to balance the class distribution. However, such method often produces sample data sets that are highly correlated if we only sample more minority class instances, thus reducing the generalizability of RF. To solve this problem, we propose a strati ed over-sampling (SOB) method to generate both balanced and diverse training data sets for RF. We rst cluster the training data set multiple times to produce multiple clustering results. The small individual clusters are grouped according to their entropies. Then we sample a set of training data sets from the groups of clusters using strati ed sampling method. Finally, these training data sets are used to train RF. The data sets sampled with SOB are guaranteed to be balanced and diverse, which improves the performance of RF on imbalanced data. We have conducted a series of experiments, and the experimental results have shown that the proposed method is more effective than some existing sampling methods. |
收录类别 | EI |
语种 | 英语 |
内容类型 | 会议论文 |
源URL | [http://ir.siat.ac.cn:8080/handle/172644/10306] ![]() |
专题 | 深圳先进技术研究院_数字所 |
作者单位 | 2016 |
推荐引用方式 GB/T 7714 | He Zhao,Xiaojun Chen,Tung Nguyen,et al. Stratified Over-sampling Bagging Method for Random Forests on Imbalanced Data[C]. 见:PAKDD 2016, Intelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedings. 新西兰. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论