首页> 外国专利> LANGUAGE INPUT ARCHITECTURE FOR CONVERTING ONE TEXT FORM TO ANOTHER TEXT FORM WITH MODELESS ENTRY

LANGUAGE INPUT ARCHITECTURE FOR CONVERTING ONE TEXT FORM TO ANOTHER TEXT FORM WITH MODELESS ENTRY

机译:用于将一种文本形式转换为具有简单输入的另一种文本形式的语言输入体系结构

摘要

A language input architecture converts input strings of phonetic text (e.g., Chinese Pinyin) to an output string of language text (e.g., Chinese Hanzi) in a manner that minimizes typographical errors and conversion errors that occur during conversion from the phonetic text to the language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. Each typing model is trained on real data, and learns probabilities of typing errors. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The probable typing candidates may be stored in a database. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string. The search engine combines the probabilities of the typing and language models to find the most probable conversion string that represents a converted form of the input string. By generating typing candidates and then using the associated conversion strings to replace the input string, the architecture eliminates many common typographical errors. When multiple typing models are employed, the architecture can automatically distinguish among multiple languages without requiring mode switching for entry of the different languages.
机译:语言输入体系结构以最小化印刷错误和从语音文本转换为语言期间发生的转换错误的方式,将输入的语音文本字符串(例如中文拼音)转换为语言文本的输出字符串(例如中文汉字)文本。语言输入体系结构具有搜索引擎,一个或多个输入模型,一种语言模型以及一个或多个针对不同语言的词典。每个打字模型都在真实数据上进行训练,并学习打字错误的可能性。打字模型被配置为基于每个候选字符串被错误地输入为输入字符串的可能性的概率来生成可以代替输入字符串的可能打字候选列表。可能的打字候选者可以存储在数据库中。语言模型根据可能的转换输出字符串表示候选字符串的可能性,为每个类型的候选者提供可能的转换字符串。搜索引擎将类型和语言模型的概率结合在一起,以找到最有可能代表输入字符串转换形式的转换字符串。通过生成候选类型,然后使用关联的转换字符串替换输入字符串,该体系结构消除了许多常见的印刷错误。当采用多种类型模型时,该体系结构可以自动区分多种语言,而无需为输入不同语言而进行模式切换。

著录项

  • 公开/公告号WO0135249A3

    专利类型

  • 公开/公告日2001-12-20

    原文格式PDF

  • 申请/专利权人 MICROSOFT CORPORATION;

    申请/专利号WO2000US28418

  • 发明设计人 LEE KAI-FU;CHEN ZHENG;HAN JIAN;

    申请日2000-10-13

  • 分类号G06F17/27;G06F17/28;

  • 国家 WO

  • 入库时间 2022-08-22 00:38:36

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号