首页> 外文会议>International conference of the German Society for Computational Linguistics and Language Technology >A Study of Chinese Word Segmentation Based on the Characteristics of Chinese
【24h】

A Study of Chinese Word Segmentation Based on the Characteristics of Chinese

机译:基于汉语特征的汉语分词研究

获取原文

摘要

This paper introduces the research on Chinese word segmentation (CWS). The word segmentation of Chinese expressions is difficult due to the fact that there is no word boundary in Chinese expressions and that there are some kinds of ambiguities that could result in different segmentations. To distinguish itself from the conventional research that usually emphasizes more on the algorithms employed and the workflow designed with less contribution to the discussion of the fundamental problems of CWS, this paper firstly makes effort on the analysis of the chaxacteristics of Chinese and several categories of ambiguities in Chinese to explore potential solutions. The selected conditional random field models are trained with a quasi-Newton algorithm to perform the sequence labeling. To consider as much of the contextual information as possible, an augmented and optimized set of features is developed. The experiments show promising evaluation scores as compared to some related works.
机译:本文介绍了汉语分词(CWS)的研究。由于汉语表达中没有单词边界,并且存在一些歧义可能导致不同的细分,因此汉语表达的单词分割很困难。为了与传统研究区分开来,传统研究通常只强调使用的算法和设计的工作流程,而对研究CWS的基本问题的贡献较少,因此本文首先着力分析中文的字词特征和几种歧义用中文探索潜在的解决方案。使用准牛顿算法训练所选条件随机场模型以执行序列标记。为了尽可能多地考虑上下文信息,开发了一组增强和优化的功能。与一些相关作品相比,实验显示出令人鼓舞的评估分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号