首页> 外文期刊>IEEE intelligent systems >Automatic Word Spacing Using Probabilistic Models Based on Character n-grams
【24h】

Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

机译:使用基于字符n元语法的概率模型自动单词间距

获取原文
获取原文并翻译 | 示例
           

摘要

On the Internet, information is largely in text form, which often includes such errors as spelling mistakes. These errors complicate natural language processing because most NLP applications aren''t robust and assume that the input data is noise free. Preprocessing is necessary to deal with these errors and meet the growing need for automatic text processing. One kind of such preprocessing is automatic word spacing. This process decides correct boundaries between words in a sentence containing spacing errors, which are a type of spelling error. Except for some Asian languages such as Chinese and Japanese, most languages have explicit word spacing. In these languages, word spacing is crucial to increase readability and to accurately communicate a text''s meaning. Automatic word spacing plays an important role not only as a spell-checker module but also as a preprocessor for a morphological analyzer, which is a fundamental tool for NLP applications. Furthermore, automatic word spacing can serve as a postprocessor for optical-character-recognition systems and speech recognition systems
机译:在Internet上,信息主要为文本形式,通常包括诸如拼写错误之类的错误。这些错误使自然语言处理复杂化,因为大多数NLP应用程序都不可靠,并且假定输入数据没有噪声。预处理对于处理这些错误并满足自动文​​本处理不断增长的需求是必需的。这种预处理的一种是自动单词间隔。此过程将确定包含间距错误的句子中单词之间的正确边界,间距错误是一种拼写错误。除某些亚洲语言(如中文和日语)外,大多数语言都有明确的单词间距。在这些语言中,单词间距对于提高可读性和准确传达文本的含义至关重要。自动单词间距不仅作为拼写检查器模块而且还作为形态分析器的预处理器发挥着重要作用,形态分析器是NLP应用程序的基本工具。此外,自动单词间距可以用作光学字符识别系统和语音识别系统的后处理器

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号