首页> 外文期刊>Information Processing & Management >Extraction of complex index terms in non-English IR: A shallow parsing based approach
【24h】

Extraction of complex index terms in non-English IR: A shallow parsing based approach

机译:非英语IR中复杂索引项的提取:一种基于浅层分析的方法

获取原文
获取原文并翻译 | 示例
       

摘要

The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.
机译:信息检索系统的性能受到自然语言文本中存在的语言变化的限制。单词级自然语言处理技术已被证明可以减少这种变化。在本文中,我们以西班牙语为例,总结了我们在扩展这些技术以处理欧洲语言中的短语级变体方面的工作。我们建议使用句法依存关系作为复杂的索引项,以尝试解决源自句法和形态-句法变化的问题,并以此方式获得更精确的索引项。通过基于有限状态换能器级联的浅解析器来获得这种依赖性,以便尽可能地减少由于该解析过程而导致的开销。还研究了句法信息,查询或文档的不同来源的使用,以及对从名词短语获得的依赖的限制。我们的方法已经使用CLEF语料库进行了测试,在经典词级非语言技术方面获得了一致的改进。结果表明,一方面,从文档中提取的语法信息比从查询中提取的语法信息更有用。另一方面,已经证明,通过将依赖关系限制为与名词短语相对应的依赖关系,尽管可以以稍微降低性能为代价来实现存储和管理成本的显着降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号