首页> 外文期刊>International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms >A CORPUS BASED TECHNIQUE FOR REPAIRING ILL-FORMED SENTENCES WITH WORD ORDER ERRORS USING CO-OCCURRENCES OF N-GRAMS
【24h】

A CORPUS BASED TECHNIQUE FOR REPAIRING ILL-FORMED SENTENCES WITH WORD ORDER ERRORS USING CO-OCCURRENCES OF N-GRAMS

机译:基于语料库的基于N字共现的单词顺序错误修复形式不正确的句子的技术

获取原文
获取原文并翻译 | 示例
           

摘要

There are several reasons to expect that recognising word order errors in a text will be a difficult problem, and recognition rates reported in the literature are in fact low. Although grammatical rules constructed by computational linguists improve the performance of a grammar checker in word order diagnosis, the repairing task is still very difficult. This paper describes a method to repair any sentence with wrong word order using a statistical language model (LM). A good indicator of whether a person really knows a language is the ability to use the appropriate words in a sentence in correct word order. The "scrambled" words in a sentence produce a meaningless sentence. Most languages have a fairly fixed word order. This paper introduces a method, which is language independent, for repairing word order errors in sentences using the probabilities of most typical trigrams and bigrams extracted from a large text corpus such as the British National Corpus (BNC).
机译:有多种理由可以预期,识别文本中的单词顺序错误将是一个难题,并且文献中报道的识别率实际上很低。尽管由计算语言学家构建的语法规则提高了词序诊断中语法检查器的性能,但修复任务仍然非常困难。本文介绍了一种使用统计语言模型(LM)修复词序错误的句子的方法。一个人是否真的会说一种语言的一个很好的指标就是以正确的词序使用句子中适当词的能力。句子中的“加扰”单词会产生无意义的句子。大多数语言的字序都比较固定。本文介绍了一种独立于语言的方法,该方法利用从大型文本语料库(如英国国家语料库(BNC))中提取的最典型的三字母组和双字母组的概率来修复句子中的单词顺序错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号