首页> 外文会议>International Symposium on Computational Intelligence and Design >The Tibetan Microblog Text Representation Method Based on Shallow Parsing
【24h】

The Tibetan Microblog Text Representation Method Based on Shallow Parsing

机译:基于浅析解析的藏微博文本表示方法

获取原文

摘要

Tibetan text representation, which has great influence on Tibetan text Categorization and Cluster, is the groundwork in Tibetan text mining. Tibetan microblog is one of the most popular Tibetan network media. Researches on Tibetan microblog are now increasing. However, because of the special features of microblog text and the features of Tibetan language, traditional Tibetan text representation method cannot satisfy the need. This paper proposes a Tibetan microblog text representation method that is based on shallow parsing and takes the Tibetan micro-blog sentiment analysis experiment. First, for Tibetan micro-blog text, the syntactic structure is generated by using syntactic tree. Second, the semantic feature space is built based on syntactic structures semantic features. Then, the semantic Cluster centroid is formed with the K-means method in the feature space. Last, the TF-IDF value based on cluster is calculated. The experiment shows, the method of this paper is compared with the SVM+TF-IDF and Naive Bayes+ the Maximum Entropy method, the F-measure is as high as 91.4%.
机译:西藏文本表示,对西藏文本分类和集群产生很大影响,是藏文本挖掘的基础。藏族微博是最受欢迎的西藏网络媒体之一。藏族微博的研究现在正在增加。但是,由于微博文本的特殊特征和藏语的特征,传统的西藏文本表示方法无法满足需求。本文提出了一种基于浅析解析的藏微博文本表示方法,并采用藏微博情绪分析实验。首先,对于藏族微博文本,通过使用句法树生成句法结构。其次,基于语法结构语义特征构建语义特征空间。然后,在特征空间中的K-均值方法形成语义簇质心。最后,计算基于群集的TF-IDF值。实验表明,本文的方法与SVM + TF-IDF和Naive Bayes +最大熵方法进行比较,F测量值高达91.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号