题名面向BBS的舆情分析系统的研究与实现
作者刘玉
学位类别硕士
答辩日期2011-06-01
授予单位中国科学院研究生院
授予地点北京
导师孙静
关键词信息处理技术::信息处理技术其他学科 舆情分析 词汇链算法 向量空间模型 BBS
学位专业计算机应用技术
中文摘要随着信息传播的重要新媒介网络的产生,BBS已经成为了传播最快的公共平台之一。在这个复杂的网络环境中,社会中各种现象、问题和观点交错其中,不但存在着正面的信息,也潜存着危险的信号。所以把握舆情,对舆情信息进行分析就是亟待解决的问题。但目前已有的系统大多是对新闻网页进行分析,针对BBS的舆情分析系统还有待开发。本文针对上述的问题,汲取了已有技术的优势,同时结合BBS的特点,改进并设计了面向BBS的数据采集、主题分类和关键词提取等方法,同时构建并实现了一个面向BBS的舆情分析系统。本文围绕BBS的数据环境,以提高面向BBS的舆情分析效果为主要目标,主要研究了面向BBS的信息爬取、敏感话题监测和热点话题发现等模块的关键技术,主要的研究内容包括以下几个方面:1.      面向BBS的数据采集方法研究并总结了BBS的数据特征,并结合特性改进舆情分析系统的数据采集和抽取方法。文本提出面向BBS的网络爬虫,根据其URL特征进行个性信息爬取;信息抽取时利用BBS数据的结构化特点,以基于模板的方法提取有效信息。2.      面向BBS的主题分类文本充分利用BBS数据特征,改进了文本特征项抽取方法,提出了面向BBS的特征重调整方法;重点研究了向量空间模型并加以改进,提出基于VSMBBS文本分类算法;利用改进的算法设计了面向BBS敏感话题监测机制。3.      基于词汇链方法的热点话题发现本文研究了词汇链的构造与抽取方法并对其进行了改进,提出了面向BBS的相似度计算方法,并根据BBS的数据特征进行词汇链构造,设计出针对BBS数据的关键词抽取方法,同时构建出一种基于词汇链方法的热点话题发现机制。4.      面向BBS的舆情分析系统本文设计并实现了面向BBS的舆情分析系统,完成了对BBS舆情的敏感话题监测、热点话题分析和舆情跟踪等功能,提高了BBS舆情分析结果的准确性和全面性。
英文摘要With the generation of network as an important media of information dissemination, BBS has become one of the fastest growing common platforms for the dissemination. In this complex network environment, various phenomena in society, problems and perspectives intersect. There is not only the positive information, but also latent danger. So taking public opinion well known and analyzing the information of public opinion is an problem to be issolved. But most of the existing systems are for the news, and the analysis system of public opinion for BBS has yet to be developed.To solve the problems above, this paper drew the advantage of the existing technology. Combined with the characteristics of BBS, data collection, subject categories and keyword extraction method for BBS was designed and improved. At the same time, BBS Public Opinion Analysis System was built and implemented. This paper focused on BBS data environment and the purpose was to enhance the effect of the public opinion analysis for BBS. In this paper, the main research was on the key technology of information crawling for BBS, sensitive topics monitoring and hot topics finding, which included the following aspects:1.      Data Collection Methods for BBS The characteristics of BBS data was researched and summarized, and combining with BBS properties, data collection and extraction methods used in Public Opinion Analysis System was improved. Then, the improved web crawler for BBS was proposed in this paper, and it collected the information personality according to BBS URL characteristics. At last, according to the structural features of BBS data, template-based approach was proposed and used when extracting information. 2.      Classification of BBS Theme With full using of BBS data characteristics, the text entry feature extraction method was improved, and re-adjustment method for BBS features was proposed. Then, on the basis of existing technologies, this paper focused on the vector space model and improved it. And at the same time, text classification algorithm for BBS based on VSM was proposed. Finally, by using the improved algorithm in this system, a monitoring of sensitive topics mechanism for BBS was proposed and designed. 3.      Hot Topic Finding Approach based on Lexical Chain
This paper studied the structure of lexical chain and extraction method and also did some improvements on them. Then, based on the characteristics of BBS data, the similarity measure for BBS was proposed and the structure of lexical chain for BBS data was made. Finally, the key words extraction method was designed, and at the same time, the hot topics discovery method based on lexical chain was constructed.
4.      BBS Public Opinion Analysis System In this paper, BBS public opinion analysis system was designed and implemented and it included some functions such as BBS sensitive topics monitoring, hot topics finding and topics tracking.And the accuracy and comprehensiveness of the public opinion analysis system for BBS was improved.
公开日期2011-06-07
内容类型学位论文
源URL[http://124.16.136.157/handle/311060/10205]  
专题软件研究所_人机交互技术与智能信息处理实验室_学位论文
推荐引用方式
GB/T 7714
刘玉. 面向BBS的舆情分析系统的研究与实现[D]. 北京. 中国科学院研究生院. 2011.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace