首页> 外文会议>Advances in Natural Language Processing >A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)
【24h】

A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)

机译:使用语言身份层次结构(HOLI)的混合方法合法化算法

获取原文
获取原文并翻译 | 示例

摘要

We present a new mixed method lemmatizer for Icelandic, Lemmald, which achieves good performance by relying on IceTagger for tagging and The Icelandic Frequency Dictionary corpus for training. We combine the advantages of data-driven machine learning with linguistic insights to maximize performance. To achieve this, we make use of a novel approach: Hierarchy of Linguistic Identities (HOLI), which involves organizing features and feature structures for the machine learning based on linguistic knowledge. Accuracy of the lemmatization is further improved using an add-on which connects to the Database of Modern Icelandic Inflections. Given correct tagging, our system lemmatizes Icelandic text with an accuracy of 99.55%. We believe our method can be fruitfully adapted to other morphologically rich languages.
机译:我们提出了一种新的针对冰岛人的混合方法lemmatizer,即Lemmald,它依靠IceTagger进行标记并通过Icelandic Frequency Dictionary语料库进行培训而获得了良好的性能。我们将数据驱动的机器学习的优势与语言见解相结合,以最大限度地提高性能。为了实现这一点,我们使用一种新颖的方法:语言身份层次结构(HOLI),它涉及基于语言知识为机器学习组织特征和特征结构。使用连接到现代冰岛语拐点数据库的附加组件,进一步提高了残词化的准确性。给定正确的标记,我们的系统以99.55%的准确度对冰岛文本进行词素化。我们相信我们的方法可以有效地适应其他形态丰富的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号