首页> 外文学位 >A distributional and syntactic approach to fine-grained opinion mining.
【24h】

A distributional and syntactic approach to fine-grained opinion mining.

机译:一种用于细粒度意见挖掘的分布和句法方法。

获取原文
获取原文并翻译 | 示例

摘要

This thesis contributes to a larger social science research program of analyzing the diffusion of IT innovations. We show how to automatically discriminate portions of text dealing with opinions about innovations by finding {source, target, opinion} triples in text. In this context, we can discern a list of innovations as targets from the domain itself. We can then use this list as an anchor for finding the other two members of the triple at a "fine-grained" level---paragraph contexts or less.;We first demonstrate a vector space model for finding opinionated contexts in which the innovation targets are mentioned. We can find paragraph-level contexts by searching for an "expresses-an-opinion-about" relation between sources and targets using a supervised model with an SVM that uses features derived from a general-purpose subjectivity lexicon and a corpus indexing tool. We show that our algorithm correctly filters the domain relevant subset of subjectivity terms so that they are more highly valued.;We then turn to identifying the opinion. Typically, opinions in opinion mining are taken to be positive or negative. We discuss a crowd sourcing technique developed to create the seed data describing human perception of opinion bearing language needed for our supervised learning algorithm. Our user interface successfully limited the meta-subjectivity inherent in the task ("What is an opinion?") while reliably retrieving relevant opinionated words using labour not expert in the domain.;Finally, we developed a new data structure and modeling technique for connecting targets with the correct within-sentence opinionated language. Syntactic relatedness tries (SRTs) contain all paths from a dependency graph of a sentence that connect a target expression to a candidate opinionated word. We use factor graphs to model how far a path through the SRT must be followed in order to connect the right targets to the right words. It turns out that we can correctly label significant portions of these tries with very rudimentary features such as part-of-speech tags and dependency labels with minimal processing. This technique uses the data from the crowdsourcing technique we developed as training data.;We conclude by placing our work in the context of a larger sentiment classification pipeline and by describing a model for learning from the data structures produced by our work. This work contributes to computational linguistics by proposing and verifying new data gathering techniques and applying recent developments in machine learning to inference over grammatical structures for highly subjective purposes. It applies a suffix tree-based data structure to model opinion in a specific domain by imposing a restriction on the order in which the data is stored in the structure.
机译:本文为分析IT创新扩散的大型社会科学研究计划做出了贡献。我们展示了如何通过在文本中找到{源,目标,观点}三元组来自动区分与创新观点相关的文本部分。在这种情况下,我们可以从领域本身中识别出一系列创新作为目标。然后,我们可以将此列表用作在“细粒度”级别(或以下段落上下文)下查找三元组的其他两个成员的锚点;我们首先演示矢量空间模型,用于查找有思想的上下文,在其中进行创新提到目标。我们可以使用带有SVM的监督模型在源和目标之间搜索“表达-表达-关于”关系,从而找到段落级别的上下文,该SVM使用从通用主观词典和语料库索引工具派生的功能。我们证明了我们的算法正确过滤了主观性术语的领域相关子集,使它们具有更高的价值。;然后我们转向识别观点。通常,观点挖掘中的观点被认为是正面的或负面的。我们讨论一种开发用于创建种子数据的众包技术,该种子数据描述了我们的监督学习算法所需的人类对带有观点的语言的感知。我们的用户界面成功地限制了任务固有的元主观性(“什么是观点?”),同时使用领域内非专家的劳动力可靠地检索了相关的有观点的单词。最后,我们开发了一种新的数据结构和建模技术来进行连接以正确的句子内有针对性的语言来定位目标。句法相关性尝试(SRT)包含句子的依赖关系图中的所有路径,这些路径将目标表达与候选自选单词相连。我们使用因子图来建模必须经过SRT的路径,才能将正确的目标连接到正确的单词。事实证明,我们可以使用非常简单的功能(例如词性标签和依赖标签)以最少的处理正确标记这些尝试的重要部分。该技术使用了我们从开发的众包技术中获得的数据作为训练数据。我们的结论是将我们的工作放在更大的情感分类管道中,并描述了一个从我们的工作产生的数据结构中学习的模型。这项工作通过提出和验证新的数据收集技术,并将机器学习的最新发展应用于高度主观的语法结构推理,为计算语言学做出了贡献。通过对数据在结构中的存储顺序施加限制,它将基于后缀树的数据结构应用于特定域中的意见模型。

著录项

  • 作者

    Sayeed, Asad Basheer.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Language Linguistics.;Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号