...
首页> 外文期刊>Applied Artificial Intelligence >An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language
【24h】

An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language

机译:形态丰富语言中统计词库构建的算法方案

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Corpus-based automatic thesaurus construction uses linguistic methods, such as Part-of-Speech taggers and parsers, which often perform poorly on MRLs. Therefore, in this paper, we focused on the complex task of adapting corpus-based thesaurus construction methods for MRLs. We investigated two statistical approaches for thesaurus construction; a) a first-order co-occurrence-based approach and b) a second-order distributional-based approach. We explored alternative levels of morphological term representations complemented by grouping the morphological variants. We then introduced and adopted a generic algorithmic scheme for thesaurus construction in MRLs for both first-order and second-order approaches. Our scheme investigated alternative representation levels and offered alternative configurations. We demonstrated the empirical benefits of our methodology for a diachronic Hebrew thesaurus construction. We used morphological analysis tools, defined and applied a new annotation scheme, and demonstrated its optimal configuration, which outperforms the baseline for both first and second order corpus-based thesaurus construction approaches.
机译:基于语料库的自动同义词库构建使用语言方法,例如词性标记器和解析器,这些方法通常在MRL上表现不佳。因此,在本文中,我们专注于将基于语料库的叙词表构建方法应用于MRL的复杂任务。我们研究了叙词表构建的两种统计方法; a)一阶基于共生的方法,b)二阶基于分布的方法。我们探索了形态术语表示形式的替代水平,并通过对形态变体进行分组进行了补充。然后,我们针对一阶和二阶方法引入并采用了用于MRL中词库构建的通用算法方案。我们的方案研究了替代表示级别并提供了替代配置。我们证明了我们的方法对于历时希伯来词库构建的经验优势。我们使用了形态学分析工具,定义并应用了新的注释方案,并展示了其最佳配置,该配置优于一阶和二阶基于语料库的词库构建方法的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号