Variable Length Concentration based Feature Construction Method for Spam Detection | |
Gao, Yang ; Mi, Guyue ; Tan, Ying | |
2015 | |
关键词 | NETWORKS |
英文摘要 | In the field of spam detection, concentration methods have been proposed for feature construction in recent years, which convert emails into fixed length feature vectors. This paper presents a novel method aiming to break through the limit of feature vector's length. Specifically, the method uses a fixed-length sliding window to divide each email into several sections. The number of sections depends on the length of each email. Consequently, length of feature vectors varies from each other and this paper names them variable length concentrations (VLC). This method can acquire adaptive feature vectors according to different lengths of emails. However, general classifiers are not suitable for this kind of feature vectors, because they are not able to handle fixed-length inputs. As a result, this paper applies recurrent neural networks (RNNs), whose inputs are not restricted by the length, to achieve spam detection. Recall, precision, accuracy and F-1 measure are taken to evaluate the method's performance. Experimental results on the classic corpora, PU1, PU2, PU3 and PUA, show that VLC performs significantly better than previously proposed methods, which provides support to the effectiveness of our method.; EI; CPCI-S(ISTP); gaoyang0115@pku.edu.cn; miguyue@pku.edu.cn; ytan@pku.edu.cn; 2015-September |
语种 | 中文 |
出处 | 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) |
DOI标识 | 10.1109/IJCNN.2015.7280346 |
内容类型 | 其他 |
源URL | [http://ir.pku.edu.cn/handle/20.500.11897/436621] |
专题 | 信息科学技术学院 |
推荐引用方式 GB/T 7714 | Gao, Yang,Mi, Guyue,Tan, Ying. Variable Length Concentration based Feature Construction Method for Spam Detection. 2015-01-01. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论