首页> 外文会议>Fifth conference on applied natural language processing >Fast Statistical Parsing of Noun Phrases for Document Indexing
【24h】

Fast Statistical Parsing of Noun Phrases for Document Indexing

机译:用于名词索引的名词短语的快速统计解析

获取原文
获取原文并翻译 | 示例

摘要

Information Retrieval (IR) is an important application area of Natural Language Processing (NLP) where one encounters the genuine challenge of processing large quantities of unrestricted natural language text. While much effort has been made to apply NLP techniques to IR, very few NLP techniques have been evaluated on a document collection larger than several megabytes. Many NLP techniques are simply not efficient enough, and not robust enough, to handle a large amount of text. This paper proposes a new probabilistic model for noun phrase parsing, and reports on the application of such a parsing technique to enhance document indexing. The effectiveness of using syntactic phrases provided by the parser to supplement single words for indexing is evaluated with a 250 megabytes document collection. The experiment's results show that supplementing single words with syntactic phrases for indexing consistently and significantly improves retrieval performance.
机译:信息检索(IR)是自然语言处理(NLP)的重要应用领域,人们在其中遇到了处理大量无限制自然语言文本的真正挑战。尽管已经为将NLP技术应用于IR做出了很多努力,但对大于几兆字节的文档集进行的NLP技术评估却很少。许多NLP技术根本不够高效,也不够健壮,无法处理大量文本。本文提出了一种新的名词短语解析概率模型,并报道了这种解析技术在增强文档索引方面的应用。使用250兆字节的文档集合评估了使用解析器提供的语法短语来补充单个单词以进行索引的有效性。实验结果表明,用句法短语补充单个单词以使索引一致并显着提高了检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号