【24h】

A Classical Chinese Corpus with Nested Part-of-Speech Tags

机译:一个古典的汉语语料库,嵌套部分 - 言论标签

获取原文

摘要

We introduce a corpus of classical Chinese poems that has been word segmented and tagged with parts-of-speech (POS). Due to the ill-defined concept of a 'word' in Chinese, previous Chinese corpora suffer from a lack of standardization in word segmentation, resulting in inconsistencies in POS tags, therefore hindering interoperability among corpora. We address this problem with nested POS tags, which accommodates different theories of wordhood and facilitates research objectives requiring annotations of the 'word' at different levels of granularity.
机译:我们介绍了一个古典诗歌的语料库,这是词分割的,并用演讲部分(POS)标记。由于中文中的“单词”概念,以前的中国语料库患有单词分割中缺乏标准化,导致POS标签不一致,因此在Corpora之间妨碍互操作性。我们通过嵌套的POS标签来解决这个问题,该标签可以容纳不同的措辞理论,并促进需要在不同粒度水平的“单词”注释的研究目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号