首页> 外文会议>Language Technology for Cultural Heritage, Social Sciences, and Humanities 2012 >A Classical Chinese Corpus with Nested Part-of-Speech Tags
【24h】

A Classical Chinese Corpus with Nested Part-of-Speech Tags

机译:带有嵌套词性标签的中国古典语料库

获取原文
获取原文并翻译 | 示例

摘要

We introduce a corpus of classical Chinese poems that has been word segmented and tagged with parts-of-speech (POS). Due to the ill-defined concept of a 'word' in Chinese, previous Chinese corpora suffer from a lack of standardization in word segmentation, resulting in inconsistencies in POS tags, therefore hindering interoperability among corpora. We address this problem with nested POS tags, which accommodates different theories of wordhood and facilitates research objectives requiring annotations of the 'word' at different levels of granularity.
机译:我们介绍了一组经过分词并用词性(POS)标记的中国古典诗歌。由于中文中“单词”概念的定义不明确,以前的中文语料库在词段分割方面缺乏标准化,导致POS标签不一致,因此妨碍了语料库之间的互操作性。我们使用嵌套的POS标签解决了这个问题,该标签适用于不同的词汇理论,并有助于研究目标,要求在不同的粒度级别上注释“单词”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号