首页> 外文会议>Conference on empirical methods in natural language processing;Conference on computational natural language learning >A Unified Approach to Transliteration-based Text Input with Online Spelling Correction
【24h】

A Unified Approach to Transliteration-based Text Input with Online Spelling Correction

机译:具有在线拼写校正的基于音译的文本输入的统一方法

获取原文

摘要

This paper presents an integrated, end-to-end approach to online spelling correction for text input. Online spelling correction refers to the spelling correction as you type, as opposed to post-editing. The online scenario is particularly important for languages that routinely use transliteration-based text input methods, such as Chinese and Japanese, because the desired target characters cannot be input at all unless they are in the list of candidates provided by an input method, and spelling errors prevent them from appearing in the list. For example, a user might type suesheng by mistake to mean xuesheng 学生 'student' in Chinese; existing input methods fail to convert this misspelled input to the desired target Chinese characters. In this paper, we propose a unified approach to the problem of spelling correction and transliteration-based character conversion using an approach inspired by the phrase-based statistical machine translation framework. At the phrase (substring) level, k most probable pinyin (Romanized Chinese) corrections are generated using a monotone decoder; at the sentence level, input pinyin strings are directly transliterated into target Chinese characters by a decoder using a log-linear model that refer to the features of both levels. A new method of automatically deriving parallel training data from user keystroke logs is also presented. Experiments on Chinese pinyin conversion show that our integrated method reduces the character error rate by 20% (from 8.9% to 7.12%) over the previous state-of-the art based on a noisy channel model.
机译:本文提出了一种集成的,端到端的在线输入文本拼写校正方法。在线拼写更正是指您键入时进行的拼写更正,而不是后期编辑。对于通常使用基于音译的文本输入法的语言(例如中文和日语),在线方案尤其重要,因为除非目标输入字符不在输入法提供的候选单词和拼写列表中,否则根本无法输入所需的目标字符错误会阻止它们出现在列表中。例如,用户可能会错误地键入suesheng来表示xuesheng学生的中文“ student”;现有的输入法无法将此拼写错误的输入转换为所需的目标中文字符。在本文中,我们提出了一种统一的方法,该方法采用了基于短语的统计机器翻译框架的启发,来解决基于拼写校正和音译的字符转换问题。在短语(子字符串)级别,使用单调解码器生成k个最可能的拼音(罗马化中文)更正;在句子级别,输入的拼音字符串由解码器使用对数线性模型引用两个级别的特征,直接音译为目标汉字。还提出了一种自动从用户击键日志中导出并行训练数据的新方法。汉语拼音转换实验表明,我们的集成方法比基于噪声信道模型的现有技术将字符错误率降低了20%(从8.9%降低到7.12%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号