CORC  > 北京大学  > 信息科学技术学院
Partitioning Based N-Gram Feature Selection for Malware Classification
Hu, Weiwei ; Tan, Ying
2016
关键词Malware classification Feature selection Data partitioning Apache Spark
英文摘要Byte level N-Gram is one of the most used feature extraction algorithms for malware classification because of its good performance and robustness. However, the N-Gram feature selection for a large dataset consumes huge time and space resources due to the large amount of different N-Grams. This paper proposes a partitioning based algorithm for large scale feature selection which efficiently resolves the original problem into in-memory solutions without heavy IO load. The partitioning process adopts an efficient implementation to convert the original interactional dataset to unrelated data partitions. Such data independence enables the effectiveness of the in-memory solutions and the parallelism on different partitions. The proposed algorithm was implemented on Apache Spark, and experimental results show that it is able to select features in a very short period of time which is nearly three times faster than the comparison MapReduce approach.; CPCI-S(ISTP); weiwei.hu@pku.edu.cn; ytan@pku.edu.cn; 187-195; 9714
语种英语
出处1st International Conference on Data Mining and Big Data (DMBD)
DOI标识10.1007/978-3-319-40973-3_18
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/460183]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Hu, Weiwei,Tan, Ying. Partitioning Based N-Gram Feature Selection for Malware Classification. 2016-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace