首页> 外文学位 >Intrasentential Grammatical Correction with Weighted Finite State Transducers.
【24h】

Intrasentential Grammatical Correction with Weighted Finite State Transducers.

机译:加权有限状态换能器的句子内语法校正。

获取原文
获取原文并翻译 | 示例

摘要

Natural language processing (NLP) offers significant potential for significantly enriching the communicative capabilities for a broad range of learning technologies. For example, both adaptive writing support environments and computer assisted learning environments could benefit from robust NLP. However, because texts created by novice writers pose significant challenges for core NLP systems such as syntactic and semantic parsers, robust grammatical pre-processing systems must be introduced upstream in the NLP pipeline. These challenges are exacerbated by the fact that current methods designed to detect and correct ungrammatical text focus on identifying and repairing specific types of errors, or rely heavily on contextual clues that may be unreliable in highly disfluent text.;To address these problems, we propose a noisy channel model implemented with weighted Finite State Transducers (wFSTs), where weights represent the probabilistic likelihood of transitioning between states, or in this case, words in a sentence. To construct our language model, we use a corpus of children's stories from Project Gutenberg. For the noise model, a corpus consisting of passages composed by middle school students obtained from corpus acquisition experiments is utilized. The EM algorithm identifies optimal a priori probabilities of encountering an erroneous form of a word. Preliminary results are encouraging and suggest that wFSTs offer significant promise for detecting and correcting texts exhibiting significant disfluency.
机译:自然语言处理(NLP)具有极大的潜力,可以极大地丰富各种学习技术的交流能力。例如,自适应写作支持环境和计算机辅助学习环境都可以从强大的NLP中受益。但是,由于新手作者创建的文本对诸如语法和语义解析器之类的核心NLP系统提出了重大挑战,因此必须在NLP管道的上游引入健壮的语法预处理系统。现有的旨在检测和纠正不合语法的文本的方法着重于识别和修复特定类型的错误,或者严重依赖于在高度不满的文本中可能不可靠的上下文线索,这一事实加剧了这些挑战。为解决这些问题,我们建议使用加权有限状态换能器(wFST)实施的噪声通道模型,其中权重表示状态之间(或在此情况下为句子中的单词)转换的概率。为了构建语言模型,我们使用了古腾堡计划中的儿童故事语料库。对于噪声模型,利用了由中学生从语料习得实验中获得的段落组成的语料库。 EM算法确定遇到错误形式的单词的最佳先验概率。初步结果令人鼓舞,并表明wFST为检测和纠正表现出极大不满的文本提供了巨大希望。

著录项

  • 作者

    Goth, Julius, III.;

  • 作者单位

    North Carolina State University.;

  • 授予单位 North Carolina State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 115 p.
  • 总页数 115
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号