首页> 外文期刊>ACM transactions on Asian language information processing >NOVA: A Feasible and Flexible Annotation System for Joint Tokenization and Part-of-Speech Tagging
【24h】

NOVA: A Feasible and Flexible Annotation System for Joint Tokenization and Part-of-Speech Tagging

机译:NOVA:用于联合标记化和词性标记的可行且灵活的注释系统

获取原文
获取原文并翻译 | 示例

摘要

A feasible and flexible annotation system is designed for joint tokenization and part-of-speech (POS) tagging to annotate those languages without natural definitions of words. This design was motivated by the fact that word separators are not used in many highly analytic East and Southeast Asian languages. Although several of the languages are well-studied, e.g., Chinese and Japanese, many are understudied with low resources, e.g., Burmese (Myanmar) and Khmer. In the first part of the article, the proposed annotation system, named nova, is introduced. nova contains only four basic tags (n, v, a, and o); these tags can be further modified and combined to adapt complex linguistic phenomena in tokenization and POS tagging. In the second part of the article, the feasibility and flexibility of nova is illustrated from the annotation practice on Burmese and Khmer. The relation between nova and two universal POS tagsets is discussed in the final part of the article.
机译:设计了一种可行且灵活的注释系统,用于联合标记化和词性(POS)标记,以在没有自然定义词的情况下注释那些语言。这种设计的动机是,在许多高度分析的东亚和东南亚语言中并未使用单词分隔符。尽管对几种语言进行了很好的学习,例如中文和日文,但对许多语言的学习却很少,例如缅甸语(缅甸语)和高棉语。在本文的第一部分中,介绍了提议的注释系统nova。 nova仅包含四个基本标签(n,v,a和o);这些标记可以进一步修改和组合,以适应标记化和POS标记中的复杂语言现象。在文章的第二部分,从缅甸语和高棉语的注释实践中说明了新星的可行性和灵活性。本文的最后部分讨论了nova和两个通用POS标签集之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号