A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese

机译：汉语词性标注无监督方法的比较

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We conduct a series of Part-of-Speech (POS) Tagging experiments using Ex- pectation Maximization (EM), Varia- tional Bayes (VB) and Gibbs Sampling (GS) against the Chinese Penn Tree- bank. We want to first establish a base- line for unsupervised POS tagging in Chinese, which will facilitate future re- search in this area. Secondly, by com- paring and analyzing the results between Chinese and English, we highlight some of the strengths and weaknesses of each of the algorithms in POS tagging task and attempt to explain the differences based on some preliminary linguistics analysis. Comparing to English, we find that all algorithms perform rather poorly in Chinese in 1-to-1 accuracy result but are more competitive in many-to-1 accu- racy. We attribute one possible explana- tion of this to the algorithms ’ inability to correctly produce tags that match the desired tag count distribution.

机译：我们使用预期最大化（EM），可变贝叶斯（VB）和吉布斯采样（GS）针对宾州树银行进行了一系列词性（POS）标记实验。我们希望首先建立中文无监督POS标记的基线，这将有助于将来在该领域进行搜索。其次，通过对中文和英文的结果进行比较和分析，我们重点介绍了每种算法在POS标记任务中的优缺点，并尝试通过一些初步的语言学分析来解释这些差异。与英语相比，我们发现所有算法在中文的一对一精度结果中表现都较差，但在多对一精度上更具竞争力。我们将对此的一种可能解释归因于算法无法正确生成与所需标签数量分布匹配的标签。

著录项

来源
《8th workshop on Asian language resources.》|2010年|p.135-143|共9页
会议地点 Beijing(CN);Beijing(CN);Beijing(CN)
作者
Alex Cheng; Fei Xia; Jianfeng Gao;
展开▼
作者单位

Microsoft Corporation;

Univ. of Washington;

Microsoft Research;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;程序设计、软件工程;程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies [J] . Xia Cui, Frans Coenen, Danushka Bollegala JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：数据不平衡对词性标记和数据透视选择策略的无监督域适应的影响
2. Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches [J] . Barzilay R., Eisenstein J., Naseem T., The Journal of Artificial Intelligence Research . 2009,第5期

机译：多语言词性标记：两种无监督方法
3. Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches [J] . Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, The Journal of Artificial Intelligence Research . 2009,第Null期

机译：多语言词性标记：两种无监督方法
4. A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese [C] . Alex Cheng, Fei Xia, Jianfeng Gao International conference on computational linguistics . 2010

机译：中文翻译手机版无监督方法的比较
5. Comparison of Multiple Imputation Methods for 'Unknown' Stage at Diagnosis in Cancer Data [D] . Cowan, Kayla 2018

机译：癌症数据诊断中“未知”阶段的多种插补方法的比较
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Comparison of Three Machine Learning Methods for Thai Part-of-Speech Tagging [O] . Masaki Murata, Qing Ma, Hitoshi Isahara 2002

机译：泰语词性标注的三种机器学习方法的比较

A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅