首页> 外文期刊>Computer speech and language >Extending lexical association measures for collocation extraction
【24h】

Extending lexical association measures for collocation extraction

机译:扩展词汇关联度量以进行搭配提取

获取原文
获取原文并翻译 | 示例

摘要

Collocations are linguistic phenomena that occur when two or more words appear together more often than by chance and whose meaning often cannot be inferred from the meanings of its parts. As collocations have found many applications in the fields of natural language processing, information retrieval, and text mining, extracting them from large corpora has been the focus of many studies over the past few years. In this paper, we introduce the notion of an extension pattern, a formalization of the idea of extending lexical association measures (AMs) defined for bigrams. An extension pattern provides a measure-independent way of extending AMs for extracting collocations of arbitrary length. We define different extension patterns and compare them on a task of extracting collocations from a newspaper corpus. We show that the stopword-sensitive extension patterns we propose outperform other extensions, which indicates that AMs could benefit by taking into account linguistic information about an n-gram's part-of-speech pattern.
机译:搭配是一种语言现象,当两个或两个以上的单词在一起出现的机会多于偶然,并且其含义通常无法从其各个部分的含义推论出来时,就会发生这种现象。由于并置已在自然语言处理,信息检索和文本挖掘领域中找到了许多应用,因此从大型语料库中提取它们已成为过去几年中许多研究的重点。在本文中,我们介绍了扩展模式的概念,这是为二元词定义的扩展词法关联度量(AM)概念的形式化形式。扩展模式提供了一种与度量无关的扩展AM的方式,用于提取任意长度的搭配。我们定义了不同的扩展模式,并比较了它们从报纸语料库中提取搭配的任务。我们表明,我们提出的对停用词敏感的扩展模式优于其他扩展,这表明AM可以通过考虑有关n-gram词性模式的语言信息而受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号