首页> 外文会议>Natural language processing Pacific Rim symposium >Unsupervised Chinese Word Segmentation and Unknown Word Identification
【24h】

Unsupervised Chinese Word Segmentation and Unknown Word Identification

机译:无监督的中文词分割和未知单词识别

获取原文

摘要

In this paper, we present an unsupervised model for Chinese word segmentation based on the word formation power of character string (the word form model, WFM) and the affinity of character junctures (the character juncture model, CJM). We also proposed a formula to measure the size of segmentation space and adopt a two-way segmentation algorithm in our system simultaneously. Finally, we devise a modified version of Chinese word-formation patterns to identify unknown words. Since all the parameters can be estimated directly from unsegmented texts, the approaches proposed have strong adaptability and have proved efficient through our primary experiments.
机译:在本文中,我们基于字符串字形(Word Form Model,WFM)和字符时序的亲和力(角色时装模型,CJM)的亲和力,为中文字分割的无监督模型。我们还提出了一种测量分割空间大小的公式,并同时在我们的系统中采用双向分段算法。最后,我们设计了一个修改版的汉字形成模式,以识别未知的单词。由于所有参数都可以直接从未分段文本估算,因此提出的方法具有强大的适应性,并通过我们的主要实验证明了有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号