CORC  > 北京大学  > 信息科学技术学院
Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning
Wang, Tao ; Qin, Zhenxing ; Jin, Zhi ; Zhang, Shichao
刊名journal of systems and software
2010
关键词Classification Cost-sensitive learning Over-fitting INDUCTION
DOI10.1016/j.jss.2010.01.002
英文摘要Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. (C) 2010 Elsevier Inc. All rights reserved.; Computer Science, Software Engineering; Computer Science, Theory & Methods; SCI(E); EI; 6; ARTICLE; 7,SI; 1137-1147; 83
语种英语
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/257522]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Wang, Tao,Qin, Zhenxing,Jin, Zhi,et al. Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning[J]. journal of systems and software,2010.
APA Wang, Tao,Qin, Zhenxing,Jin, Zhi,&Zhang, Shichao.(2010).Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning.journal of systems and software.
MLA Wang, Tao,et al."Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning".journal of systems and software (2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace