首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

【24h】

Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

机译：使用丰富的标点符号改善微博上的中文分词

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Micro-blog is a new kind of medium which is short and informal. While no segmented corpus of micro-blogs is available to train Chinese word segmentation model, existing Chinese word segmentation tools cannot perform equally well as in ordinary news texts. In this paper we present an effective yet simple approach to Chinese word segmentation of micro-blog. In our approach, we incorporate punctuation information of unlabeled micro-blog data by introducing characters behind or ahead of punctuations, for they indicate the beginning or end of words. Meanwhile a self-training framework to incorporate confident instances is also used, which prove to be helpful. Experiments on micro-blog data show that our approach improves performance, especially in OOV-recall.

机译：微博客是一种简短而非正式的新型媒介。尽管没有分段的微博客语料可用于训练中文分词模型，但是现有的中文分词工具的性能无法与普通新闻文本一样好。在本文中，我们提出了一种有效而简单的微博客中文分词方法。在我们的方法中，我们通过在标点符号的后面或前面引入字符来合并未标记的微博客数据的标点符号信息，因为它们指示单词的开头或结尾。同时，还使用了一个包含自信实例的自我训练框架，这被证明是有帮助的。对微博客数据进行的实验表明，我们的方法可以改善性能，尤其是在OOV调用中。

著录项

来源
《Annual meeting of the Association for Computational Linguistics 》|2013年|177-182|共6页
会议地点
作者
Longkai Zhang; Li Li; Zhengyan He; Houfeng Wang; Ni Sun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Method for Extracting New Login Sentiment Words From Chinese Micro-Blog Based on Improved Mutual Information [J] . Zhu Guangli, Liu Wenting, Zhang Shunxiang, International Journal of Computer Systems Science & Engineering . 2020 ,第3期

机译：基于改进的相互信息从中国微博中提取新登录情绪词的方法
2. A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth [J] . Pattern Analysis and Applications . 2020 ,第2期

机译：基于改进的FP增长的微博短文本中文未知词识别方法
3. An improved neural network for domain adaptive Chinese word segmentation [J] . Jiang Ming, Huang Tao, Zhang Min, Journal of Computational Methods in Sciences and Engineering . 2020 ,第4期

机译：用于域自适应汉字分割的改进神经网络
4. Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations [C] . Longkai Zhang, Li Li, Zhengyan He, Annual meeting of the Association for Computational Linguistics . 2013

机译：利用丰富的点击改善了微博的文字分割
5. Experimental comparison of discriminative learning approaches for Chinese word segmentation. [D] . Song, Dong. 2008

机译：判别学习方法对中文分词的实验比较。
6. The Trade-Off Between Format Familiarity and Word-Segmentation Facilitation in Chinese Reading [O] . Mingjing Chen, Yongsheng Wang, Bingjie Zhao, 2021

机译：中文阅读中格式熟悉与词分割便利的权衡
7. Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts [O] . Qiu, Xipeng, Qian, Peng, Yin, Liusong, 2015

机译：NLpCC 2015共享任务概述：中文分词和微博文本的pOs标记

Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

摘要

著录项

相似文献

相关主题

期刊订阅