Partitioning Based N-Gram Feature Selection for Malware Classification

doi:10.1007/978-3-319-40973-3_18

CORC > 北京大学 > 信息科学技术学院

	Partitioning Based N-Gram Feature Selection for Malware Classification
	Hu, Weiwei ; Tan, Ying
	2016
关键词	Malware classification Feature selection Data partitioning Apache Spark
英文摘要	Byte level N-Gram is one of the most used feature extraction algorithms for malware classification because of its good performance and robustness. However, the N-Gram feature selection for a large dataset consumes huge time and space resources due to the large amount of different N-Grams. This paper proposes a partitioning based algorithm for large scale feature selection which efficiently resolves the original problem into in-memory solutions without heavy IO load. The partitioning process adopts an efficient implementation to convert the original interactional dataset to unrelated data partitions. Such data independence enables the effectiveness of the in-memory solutions and the parallelism on different partitions. The proposed algorithm was implemented on Apache Spark, and experimental results show that it is able to select features in a very short period of time which is nearly three times faster than the comparison MapReduce approach.; CPCI-S(ISTP); weiwei.hu@pku.edu.cn; ytan@pku.edu.cn; 187-195; 9714
语种	英语
出处	1st International Conference on Data Mining and Big Data (DMBD)
DOI标识	10.1007/978-3-319-40973-3_18
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/460183]
专题	信息科学技术学院
推荐引用方式 GB/T 7714	Hu, Weiwei,Tan, Ying. Partitioning Based N-Gram Feature Selection for Malware Classification. 2016-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们