首页> 外文会议>International Conference of the German Society for Computational Linguistics and Language Technology >A Study of Chinese Word Segmentation Based on the Characteristics of Chinese
【24h】

A Study of Chinese Word Segmentation Based on the Characteristics of Chinese

机译:基于汉语特征的汉语分割研究

获取原文

摘要

This paper introduces the research on Chinese word segmentation (CWS). The word segmentation of Chinese expressions is difficult due to the fact that there is no word boundary in Chinese expressions and that there are some kinds of ambiguities that could result in different segmentations. To distinguish itself from the conventional research that usually emphasizes more on the algorithms employed and the workflow designed with less contribution to the discussion of the fundamental problems of CWS, this paper firstly makes effort on the analysis of the characteristics of Chinese and several categories of ambiguities in Chinese to explore potential solutions. The selected conditional random field models are trained with a quasi-Newton algorithm to perform the sequence labeling. To consider as much of the contextual information as possible, an augmented and optimized set of features is developed. The experiments show promising evaluation scores as compared to some related works.
机译:本文介绍了汉字分割(CWS)的研究。由于中文表达中没有任何词语,因此中国表达的词汇很难困难,并且有一些可能导致不同细分的一些含糊不清。为了将自己与通常强调所采用的算法以及设计的工作流程的常规研究,本文首先努力分析中文和几类歧义的特点中文探讨潜在的解决方案。所选择的条件随机字段模型用Quasi-Newton算法训练以执行序列标记。要考虑尽可能多的上下文信息,开发了一个增强和优化的功能集。实验表明,与某些相关工程相比,有前途的评估分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号