A Study of Chinese Word Segmentation Based on the Characteristics of Chinese

机译：基于汉语特征的汉语分割研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper introduces the research on Chinese word segmentation (CWS). The word segmentation of Chinese expressions is difficult due to the fact that there is no word boundary in Chinese expressions and that there are some kinds of ambiguities that could result in different segmentations. To distinguish itself from the conventional research that usually emphasizes more on the algorithms employed and the workflow designed with less contribution to the discussion of the fundamental problems of CWS, this paper firstly makes effort on the analysis of the characteristics of Chinese and several categories of ambiguities in Chinese to explore potential solutions. The selected conditional random field models are trained with a quasi-Newton algorithm to perform the sequence labeling. To consider as much of the contextual information as possible, an augmented and optimized set of features is developed. The experiments show promising evaluation scores as compared to some related works.

机译：本文介绍了汉字分割（CWS）的研究。由于中文表达中没有任何词语，因此中国表达的词汇很难困难，并且有一些可能导致不同细分的一些含糊不清。为了将自己与通常强调所采用的算法以及设计的工作流程的常规研究，本文首先努力分析中文和几类歧义的特点中文探讨潜在的解决方案。所选择的条件随机字段模型用Quasi-Newton算法训练以执行序列标记。要考虑尽可能多的上下文信息，开发了一个增强和优化的功能集。实验表明，与某些相关工程相比，有前途的评估分数。

著录项

来源
《International Conference of the German Society for Computational Linguistics and Language Technology》|2013年||共8页
会议地点
作者
Aaron Li-Feng Han; Derek F. Wong; Lidia S. Chao; Liangye He; Ling Zhu; Shuo Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Natural language processing; Chinese word segmentation; Characteristics of Chinese; Optimized features;

机译：自然语言处理;中文字分割;汉语特征;优化的功能;

相似文献

外文文献
中文文献
专利

1. Automatic Extraction Of New Words Based On Google News Corpora For Supporting Lexicon-based Chinese Word Segmentation Systems [J] . Chin-Ming Hong, Chih-Ming Chen, Chao-Yang Chiu Expert systems with applications . 2009,第2p2期

机译：基于Google新闻语料库的自动提取新词以支持基于词典的中文分词系统
2. A Chinese word segmentation based on language situation in processing ambiguous words [J] . Zhang MY, Lu ZD, Zou CY Information Sciences: An International Journal . 2004,第3a4期

机译：基于语言环境的歧义词中文分词
3. Recognizing handwritten Chinese day and month words by combining a holistic method and a segmentation-based method [J] . Chongyang Zhang, Wei Li Neural Computing and Applications . 2013,第6期

机译：结合整体和基于分割的方法识别手写的中文日月单词
4. A Study of Chinese Word Segmentation Based on the Characteristics of Chinese [C] . Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, International conference of the German Society for Computational Linguistics and Language Technology . 2013

机译：基于汉语特征的汉语分词研究
5. Wh-existential words: A comparative study of English-Chinese and Korean-Chinese interlanguages. [D] . Chu, Wei. 2014

机译：Wh存在词：英汉，韩汉两种中介语的比较研究。
6. Does a picture is worth 1000 words apply to iconic Chinese words? Relationship of Chinese words and pictures [O] . Shih-Yu Lo, Su-Ling Yeh -1

机译：一幅价值一千字的图片是否适用于标志性的汉字？中文单词和图片的关系
7. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff [O] . Wei-yun Ma 2003

机译：CKIP中文分词系统的首次国际分词推广

A Study of Chinese Word Segmentation Based on the Characteristics of Chinese

摘要

著录项

相似文献

相关主题

期刊订阅