首页> 外文会议>Advances in Information Systems >Automatic Stemming for Indexing of an Agglutinative Language
【24h】

Automatic Stemming for Indexing of an Agglutinative Language

机译:自动词干标注聚结语言

获取原文

摘要

Stemming is an essential process in information retrieval. Though there are extremely simple stemming algorithms for inflectional languages, the story goes totally different for agglutinative languages. It is even more difficult if significant portion of the vocabulary is new or unknown. This paper explores the possibility of stemming of an agglutinative language, in particular, Korean language, by unsupervised morphology learning. We use only raw corpus and make use of no dictionary. Unlike heuristic algorithms that are theoretically ungrounded, this method is based on statistical methods, which are widely accepted. Although the method is currently applied only to Korean language, the method can be adapted to other agglutinative languages with similar characteristics, since language-specific knowledge is not used.
机译:提取是信息检索中必不可少的过程。尽管有非常简单的词干变化算法,但对于胶合语言来说,情况却截然不同。如果词汇表的重要部分是新的或未知的,则更加困难。本文探讨了通过无监督形态学来阻止凝集性语言(尤其是朝鲜语)的可能性。我们仅使用原始语料库,不使用字典。与理论上没有根据的启发式算法不同,此方法基于统计方法,已被广泛接受。尽管该方法当前仅适用于朝鲜语,但是由于不使用特定于语言的知识,因此该方法可以适用于具有类似特征的其他凝集性语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号