Traditional public opinion analysis method has two defects:since lacking necessary semantic processing on public opinion texts, traditional network public opinions analysis method based on keywords or bag-of-words usually has inaccurate analysis results,i.e.,the false negative and false positive rates are relatively high;and because of sparse data,generally the method can’t timely catch the ″signs″of public opinions in early stage of public opinion development.To solve these problems,this paper presents a domain-specific grammar-based analysis method for analysing microblogging grammars,and puts forward a list of universal design principles and an analysis method for domain-specific grammar.Compared with statistical method,the advantages and the innovation points of domain-specific grammar-based method include:the domain-specific grammar can still work well in the case of data sparsity;the work mode of domain-specific grammar does not need to make statistics on information,and will not be affected by the distance of words.The domain-specific grammar-based method can well extract really useful information but will not be affected by the word collocation as the statistical method is.To demonstrate the utility of our method,we choose the public opinions of anti-corruption as the verification application.Experiments show that the grammar of public opinions in regard to corruption domain can well recognise and extract the text contents of microblogging public opinions of corruption category,therefore reaches the goal of corruption public opinions inspection.%传统的舆情分析方法存在两个缺陷:由于缺少对舆情文本必要的语义处理,传统的基于关键词或热词的网络舆情分析方法往往分析结果不准确,即漏判率和假阳性比较高;在舆情发展初期,由于数据稀疏,一般不能及时发现舆情“苗头”。针对这些问题,提出一种基于领域文法的分析方法对微博文法进行分析,并给出一套通用的领域文法的设计原则以及分析方法。基于领域文法的方法与统计方法相比主要的优点和创新点包括:领域文法可以在数据稀疏的情况下依然可以很好的工作;领域文法的方式不需要对信息进行统计,不会受到距离的影响;领域文法的方法可以很好地提取真正有用的信息而不会像统计方法易受到词搭配的影响。为论证该方法,选择贪腐类舆情作为一个验证应用。实验表明,贪腐舆情文法很好地对贪腐类微博舆情文本内容进行识别和提取,从而达到贪腐舆情监控的目的。
展开▼