【24h】

Mining Paraphrases from Self-anchored Web Sentence Fragments

机译:从自锚式Web句子片段挖掘释义

获取原文
获取原文并翻译 | 示例

摘要

Near-synonyms or paraphrases are beneficial in a variety of natural language and information retrieval applications, but so far their acquisition has been confined to clean, trustworthy collections of documents with explicit external attributes. When such attributes are available, such as similar time stamps associated to a pair of news articles, previous approaches rely on them as signals of potentially high content overlap between the articles, often embodied in sentences that are only slight, paraphrase-based variations of each other. This paper introduces a new unsupervised method for extracting paraphrases from an information source of completely different nature and scale, namely unstructured text across arbitrary Web textual documents. In this case, no useful external attributes are consistently available for all documents. Instead, the paper introduces linguistically-motivated text anchors, which are identified automatically within the documents. The anchors are instrumental in the derivation of paraphrases through lightweight pairwise alignment of Web sentence fragments. A large set of categorized names, acquired separately from Web documents, serves as a filtering mechanism for improving the quality of the paraphrases. A set of paraphrases extracted from about a billion Web documents is evaluated both manually and through its impact on a natural-language Web search application.
机译:近义词或复述在各种自然语言和信息检索应用程序中都是有益的,但到目前为止,它们的获取仅限于具有明确外部属性的干净,可信赖的文档集合。当此类属性可用时,例如与一对新闻文章相关的类似时间戳记,以前的方法将其作为潜在的高含量文章重叠信号,通常体现在句子中,每个句子只是基于短语的微小变化其他。本文介绍了一种从性质和规模完全不同的信息源中提取释义的新无监督方法,即跨任意Web文本文档的非结构化文本。在这种情况下,没有有用的外部属性可用于所有文档。取而代之的是,本文介绍了基于语言的文本锚,这些锚在文档中自动识别。锚通过Web句子片段的轻量级成对对齐,在释义的推导中起着重要作用。从Web文档中单独获取的一大类分类名称用作提高复述质量的筛选机制。从大约十亿个Web文档中提取的一组复述都可以通过人工评估,也可以通过其对自然语言Web搜索应用程序的影响来评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号