首页> 外文会议>International conference on artificial intelligence >An Efficient Tool for Building a Large Part-Of-Speech Annotated Corpus
【24h】

An Efficient Tool for Building a Large Part-Of-Speech Annotated Corpus

机译:一个有效的工具,用于构建大型语音注释的语料库

获取原文

摘要

Large part-of-speech(pos) annotated corpus play an important role in many kinds of natural language processing. So, the annotated corpus requires very high accuracy and consistency. To build such accurate and consistent corpus, we often use manual tagging. But the manual tagging is very labor intensive and expensive. Furthermore, it is not easy to get consistent results from the human experts. The goal of this work is to develope an efficient tool for building accurate and a consistent pos annotated corpus with minimal human labor. The developed tool can help minimize the amount of the human labor and make the results consistent by using lexical rules. The lexical rules are acquired from human experts in the similar way of manual tagging and manual error correction. They are used to annotate the same word in the same context in the whole corpus.
机译:大部分演讲(POS)注释语料库在多种自然语言处理中发挥着重要作用。因此,注释的语料库需要非常高的精度和一致性。要构建如此准确和一致的语料库,我们经常使用手动标记。但手动标记非常劳动密集且昂贵。此外,不容易获得人类专家的一致结果。这项工作的目标是开发一个有效的工具,用于建立准确的准确性和一致的POS注释的语料库,具有最小的人工劳动力。开发的工具可以帮助最小化人工劳动量,并通过使用词汇规则使结果一致。词汇规则是以手动标记和手动纠错的类似方式从人类专家获取。它们用于在整个语料库中的同一上下文中注释相同的单词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号