A stochastic finite-state word-segmentation algorithm for Chinese

Richard Sproat; Chilin Shih; William Gale; Nancy Chang

首页> 外文期刊>Computational linguistics >A stochastic finite-state word-segmentation algorithm for Chinese

【24h】

A stochastic finite-state word-segmentation algorithm for Chinese

机译：中文的随机有限状态分词算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The initial stage of text analysis for any NLP task usually involves the tokenization of the input into words. For languages like English one can assume, to a first approximation, that word boundaries are given by whitespace or punctuation. In various Asian languages, including Chinese, on the other hand, whitespace is never used to delimit words, so one must resort to lexical information to "reconstruct" the word-boundary information. In this paper we present a stochastic finite-state model wherein the basic workhorse is the weighted finite-state transducer. The model segments Chinese text into dictionary entries and words derived by various productive lexical processes, and--since the primary intended application of this model is to text-to-speech synthesis--provides pronunciations for these words. We evaluate the system's performance by comparing its segmentation 'Tudgments" with the judgments of a pool of human segmenters, and the system is shown to perform quite well.

机译：对于任何NLP任务，文本分析的初始阶段通常涉及将输入标记化为单词。对于像英语这样的语言，可以近似地假定单词边界是由空格或标点符号给定的。另一方面，在包括中文在内的各种亚洲语言中，绝不使用空格来分隔单词，因此必须使用词法信息来“重建”单词边界信息。在本文中，我们提出了一个随机的有限状态模型，其中基本的主力是加权有限状态传感器。该模型将中文文本分为字典条目和由各种生产性词汇过程派生的单词，并且由于该模型的主要目的是用于文本到语音合成，因此提供了这些单词的发音。我们通过将细分“ Tudgments”与一组人类细分者的判断进行比较来评估该系统的性能，并且该系统表现良好。

著录项

来源
《Computational linguistics 》 |1996年第3期| 共28页
作者
Richard Sproat; Chilin Shih; William Gale; Nancy Chang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. A stochastic finite-state word-segmentation algorithm for Chinese [J] . Richard Sproat, Chilin Shih, William Gale, Computational linguistics . 1996 ,第3期

机译：中文的随机有限状态分词算法
2. A Study on Improving Word-Segmentation Accuracy in Automatic Chinese Text Processing [J] . LI, Li 上海大学学报：英文版 . 2001 ,第0z1期

机译：中文自动文本处理中提高分词精度的研究
3. Neighbor oblivious and finite-state algorithms for circumventing local minima in geographic forwarding [J] . Singh Chandramani, Ramachandran Santosh, Anand S. V. R., Ad hoc networks . 2016 ,第nova期

机译：地理转发中绕过局部极小值的邻居遗忘和有限状态算法
4. Monte-Carlo algorithms for the improvement of finite-state stochastic controllers: application to Bayes-adaptive Markov decision processes [C] . Michael O. Duff International workshop on Artificial Intellignece and Statistics . 2001

机译：用于改进有限状态随机控制器的Monte-Carlo算法：在Bayes-Adaptive Markov决策过程中的应用
5. Finite-state machine construction methods and algorithms for phonology and morphology. [D] . Hulden, Mans. 2009

机译：语音学和形态学的有限状态机构建方法和算法。
6. Learning stochastic finite-state transducer to predict individual patient outcomes [O] . Patricia Ordoñez, Nelson Schwarz, Adnel Figueroa-Jiménez, -1

机译：学习随机有限状态换能器以预测单个患者的结果
7. A stochastic finite-state word-segmentation algorithm for Chinese [O] . Richard Sproat, William Gale, Chilin Shih, 1996

机译：中文的随机有限状态分词算法
8. Stochastic Approximations for Finite-State Markov Chains. [R] . Ma, D., Makowski, A. M., Shwartz, A. 1987

机译：有限状态马氏链的随机逼近。

A stochastic finite-state word-segmentation algorithm for Chinese

摘要

著录项

相似文献

相关主题

期刊订阅