一种适用于机器翻译的汉语分词方法

奚宁; 李博渊; 黄书剑; 陈家骏

首页> 中文期刊>中文信息学报 >一种适用于机器翻译的汉语分词方法

一种适用于机器翻译的汉语分词方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Chinese word segmentation is the first phase in building statistical machine translation (SMT) systems from Chinese into other languages. But the Chinese word segmenters trained from monolingual corpus are not necessarily suitable for SMT systems. Therefore, it is necessary to build a MT-motivated Chinese word segmenter in order to improve the quality of translation. In the paper, we incorporate two kinds of knowledge to train a Chinese word segmenter: the first comes from the Chinese-character-based bilingual alignment; and the other comes from conventional monolingual Chinese word segmentation. Both kinds of knowledge are jointly employed to train a MT-motivated word segmenter using Conditional Random Fields. In the experiment, we segment the Chinese portions of the training, development and test sets with the proposed segmenter, and built a phrase-based machine translation system. The results show an effective improvement over the baselines in terms of translation quality.%汉语分词是搭建汉语到其他语言的统计机器翻译系统的一项重要工作.从单语语料中训练得到的传统分词模型并不一定完全适合机器翻译[1].该文提出了一种基于单语和双语知识的适应于统计机器翻译系统的分词方法.首先利用对齐可信度的概念从双语字对齐语料中抽取可信对齐集合,然后根据可信对齐集合对双语语料中的中文部分重新分词；接着将重新分词的结果和单语分词工具的分词结果相融合,得到新的分词结果,并将其作为训练语料,利用条件随机场模型训练出一个融合了单双语知识的分词工具.该文用该工具对机器翻译所需的训练集、开发集和测试集进行分词,并在基于短语的统计机器翻译系统上进行实验.实验结果表明,该文所提的方法提高了系统性能.

著录项

来源
《中文信息学报》|2012年第3期|54-58,78|共6页
作者
奚宁; 李博渊; 黄书剑; 陈家骏;
展开▼
作者单位

南京大学软件新技术国家重点实验室,江苏南京210093;

南京大学计算机科学与技术系,江苏南京210093;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
中文分词; 统计机器翻译; 对齐可信度;

相似文献

中文文献
外文文献
专利

1. 适用于特定领域机器翻译的汉语分词方法 [J] . 苏晨 ,张玉洁 ,郭振 . 中文信息学报 . 2013,第005期
2. 一份社会语言调查对汉英机器翻译中词语切分的启示——汉语分词与汉英机器翻译研究系列之二 [J] . 吴志杰 . 外语研究 . 2009,第5期
3. 机器翻译中汉语词语切分的现状——汉语分词与汉英机器翻译研究系列之一 [J] . 吴志杰 . 外语研究 . 2009,第1期
4. 一种基于字和子串联合标注的汉语分词方法 [J] . 于江德 ,谷川 ,葛文英 . 山西大学学报（自然科学版） . 2011,第003期
5. 一种规则与统计相结合的汉语分词方法 [J] . 赵伟 ,戴新宇 ,尹存燕 . 计算机应用研究 . 2004,第003期
6. 一种适用于机器翻译的汉语分词方法 [C] . Li Bo-yuan ,李博渊 ,Xi Ning . 第十一届全国计算语言学学术会议 . 2011
7. 适用于汉蒙统计机器翻译的形态切分方法探究 [A] . 刘绘 . 2013

一种适用于机器翻译的汉语分词方法

摘要

著录项

相似文献

相关主题

期刊订阅