首页> 外文会议>8th workshop on Asian language resources. >A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese
【24h】

A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese

机译:汉语词性标注无监督方法的比较

获取原文
获取原文并翻译 | 示例

摘要

We conduct a series of Part-of-Speech (POS) Tagging experiments using Ex- pectation Maximization (EM), Varia- tional Bayes (VB) and Gibbs Sampling (GS) against the Chinese Penn Tree- bank. We want to first establish a base- line for unsupervised POS tagging in Chinese, which will facilitate future re- search in this area. Secondly, by com- paring and analyzing the results between Chinese and English, we highlight some of the strengths and weaknesses of each of the algorithms in POS tagging task and attempt to explain the differences based on some preliminary linguistics analysis. Comparing to English, we find that all algorithms perform rather poorly in Chinese in 1-to-1 accuracy result but are more competitive in many-to-1 accu- racy. We attribute one possible explana- tion of this to the algorithms ’ inability to correctly produce tags that match the desired tag count distribution.
机译:我们使用预期最大化(EM),可变贝叶斯(VB)和吉布斯采样(GS)针对宾州树银行进行了一系列词性(POS)标记实验。我们希望首先建立中文无监督POS标记的基线,这将有助于将来在该领域进行搜索。其次,通过对中文和英文的结果进行比较和分析,我们重点介绍了每种算法在POS标记任务中的优缺点,并尝试通过一些初步的语言学分析来解释这些差异。与英语相比,我们发现所有算法在中文的一对一精度结果中表现都较差,但在多对一精度上更具竞争力。我们将对此的一种可能解释归因于算法无法正确生成与所需标签数量分布匹配的标签。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号