Partitioning Based N-Gram Feature Selection for Malware Classification | |
Hu, Weiwei ; Tan, Ying | |
2016 | |
关键词 | Malware classification Feature selection Data partitioning Apache Spark |
英文摘要 | Byte level N-Gram is one of the most used feature extraction algorithms for malware classification because of its good performance and robustness. However, the N-Gram feature selection for a large dataset consumes huge time and space resources due to the large amount of different N-Grams. This paper proposes a partitioning based algorithm for large scale feature selection which efficiently resolves the original problem into in-memory solutions without heavy IO load. The partitioning process adopts an efficient implementation to convert the original interactional dataset to unrelated data partitions. Such data independence enables the effectiveness of the in-memory solutions and the parallelism on different partitions. The proposed algorithm was implemented on Apache Spark, and experimental results show that it is able to select features in a very short period of time which is nearly three times faster than the comparison MapReduce approach.; CPCI-S(ISTP); weiwei.hu@pku.edu.cn; ytan@pku.edu.cn; 187-195; 9714 |
语种 | 英语 |
出处 | 1st International Conference on Data Mining and Big Data (DMBD) |
DOI标识 | 10.1007/978-3-319-40973-3_18 |
内容类型 | 其他 |
源URL | [http://ir.pku.edu.cn/handle/20.500.11897/460183] |
专题 | 信息科学技术学院 |
推荐引用方式 GB/T 7714 | Hu, Weiwei,Tan, Ying. Partitioning Based N-Gram Feature Selection for Malware Classification. 2016-01-01. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论