Models and Algorithm of Chinese Word Segmentation

机译：汉语分词模型与算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Chinese word segmentation is of significance in Chinese Natural Language Processing (NLP). This paper proposes a statistical segmentation model, which integrates the character juncture model (CJM) with word bi-gram language model, and then designs a strategy for making an accurate and low-cost estimation for this model. The advantage of the proposed model is that it can employ the affinity of characters inside or outside a word and word co-occurrence information simultaneously to handle ambiguity. After investigating the differences between real and theoretical size of segmentation space, we apply A~* algorithm to perform segmentation without exhaustively searching all the potential segmentations. Experiments show that the proposed methods are efficient, achieving over 92% correct disambiguation and 84% unknown word correct identification respectively in our preliminary tests.

机译：中文分词在中文自然语言处理（NLP）中具有重要意义。本文提出了一种统计分割模型，该模型将字符接合点模型（CJM）与单词二元语法模型结合在一起，然后设计了一种对该模型进行准确且低成本估计的策略。提出的模型的优点在于，它可以同时利用单词内部或外部的字符亲和力以及单词共现信息来处理歧义。在研究了分割空间的实际大小与理论大小之间的差异之后，我们应用A〜*算法执行分割，而没有详尽地搜索所有可能的分割。实验表明，所提出的方法是有效的，在我们的初步测试中，分别实现了92％以上的正确消歧和84％以上的未知词正确识别。

著录项

来源
《International Conference on Artificial Intelligence IC-AI'2000 Vol.3, Jun 26-29, 2000, Las Vegas, Nevada, USA》|2000年|p.1279-1284|共6页
会议地点 Las Vegas NV(US);Las Vegas NV(US)
作者
Xiaolong Wang; Guohong Fu; Danial S.Yeung; James N.K.Liu; Robert Luk;
展开▼
作者单位

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
natural language processing; language model; chinese segmentation;

机译：自然语言处理；语言模型中文分割;
入库时间 2022-08-26 14:15:06

相似文献

外文文献
中文文献
专利

1. A new segmentation algorithm based on evolutionary algorithm and bag of words model [J] . Kangshun Li, Weiguang Chen, Ying Huang, International Journal of High Performance Computing and Networking . 2019,第3期

机译：一种基于进化算法和单词模型袋的新分割算法
2. Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model [J] . Canasai KRUKNGKRA, Kiyotaka UCHIMOTO, Junichi KAZAMA, IEICE Transactions on Information and Systems . 2009,第12期

机译：使用错误驱动的字-字符混合模型的联合中文分词和POS标记
3. A Compression-based Algorithm for Chinese Word Segmentation [J] . W. J. Teahan, Yingying Wen, Rodger McNab, Computational linguistics . 2000,第3期

机译：基于压缩的中文分词算法
4. Models and Algorithm of Chinese Word Segmentation [C] . Xiaolong Wang, Guohong Fu, Danial S.Yeung, International conference on artificial intelligence . 2000

机译：汉字分割模型与算法
5. Word segmentation, word recognition, and word learning: A computational model of first language acquisition. [D] . Daland, Robert. 2009

机译：分词，单词识别和单词学习：母语习得的计算模型。
6. Speculation Detection for Chinese Clinical Notes: Impacts of Word Segmentation and Embedding Models [O] . Shaodian Zhang, Tian Kang, Xingting Zhang, -1

机译：中医临床笔记的推测检测：分词和嵌入模型的影响
7. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff [O] . Wei-yun Ma 2003

机译：CKIP中文分词系统的首次国际分词推广
8. Modeling words with subword units in an articulatorily constrained speech recognition algorithm [R] . Hogden, J. 1997

机译：在语音约束语音识别算法中用子词单元建模单词

Models and Algorithm of Chinese Word Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅