Unsupervised Chinese Word Segmentation and Unknown Word Identification

机译：无监督的中文词分割和未知单词识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present an unsupervised model for Chinese word segmentation based on the word formation power of character string (the word form model, WFM) and the affinity of character junctures (the character juncture model, CJM). We also proposed a formula to measure the size of segmentation space and adopt a two-way segmentation algorithm in our system simultaneously. Finally, we devise a modified version of Chinese word-formation patterns to identify unknown words. Since all the parameters can be estimated directly from unsegmented texts, the approaches proposed have strong adaptability and have proved efficient through our primary experiments.

机译：在本文中，我们基于字符串字形（Word Form Model，WFM）和字符时序的亲和力（角色时装模型，CJM）的亲和力，为中文字分割的无监督模型。我们还提出了一种测量分割空间大小的公式，并同时在我们的系统中采用双向分段算法。最后，我们设计了一个修改版的汉字形成模式，以识别未知的单词。由于所有参数都可以直接从未分段文本估算，因此提出的方法具有强大的适应性，并通过我们的主要实验证明了有效。

著录项

来源
《Natural language processing Pacific Rim symposium》|1999年||共6页
会议地点
作者
Fu Guohong; Wang Xiaolong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类机器翻译;
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings [J] . Herman Kamper, Aren Jansen, Sharon Goldwater Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第4期

机译：使用声词嵌入的无监督分词和词典发现
2. A Lexicon-Corpus-based Unsupervised Chinese Word Segmentation Approach [J] . Lu Pengyu, Pu Jingchuan, Du Mingming, International Journal on Smart Sensing and Intelligent Systems . 2014,第1期

机译：基于词典的无人监督的汉语词组分割方法
3. Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System [J] . Yoav Goldber, Michael Elhada Computational linguistics . 2013,第1期

机译：希伯来语解析系统中的分词，未知词解析和词法一致性
4. Unsupervised Chinese Word Segmentation and Unknown Word Identification [C] . Fu Guohong, Wang Xiaolong Natural language processing Pacific Rim symposium . 1999

机译：无监督的中文词分割和未知单词识别
5. Word segmentation, word recognition, and word learning: A computational model of first language acquisition. [D] . Daland, Robert. 2009

机译：分词，单词识别和单词学习：母语习得的计算模型。
6. Does a picture is worth 1000 words apply to iconic Chinese words? Relationship of Chinese words and pictures [O] . Shih-Yu Lo, Su-Ling Yeh -1

机译：一幅价值一千字的图片是否适用于标志性的汉字？中文单词和图片的关系
7. Pruning False Unknown Words to Improve Chinese Word Segmentation [O] . Goh Chooi-Ling, 浅原正幸, 松本裕治 2005

机译：修剪错误的未知单词以改善中文分词

Unsupervised Chinese Word Segmentation and Unknown Word Identification

摘要

著录项

相似文献

相关主题

期刊订阅