首页> 外文学位 >Automatic readability assessment.
【24h】

Automatic readability assessment.

机译:自动可读性评估。

获取原文
获取原文并翻译 | 示例

摘要

We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.
机译:我们描述了一种自动工具的开发,以评估文本文档的可读性。我们的可读性评估工具可以高精度地预测小学等级的课文水平。该工具是在文本语料库上使用监督的机器学习技术开发的,并标注了年级和其他阅读难度指标。从文本中提取各种自变量或特征,并将其用于自动分类。我们系统地探索不同的功能清单,并评估所得分类器的等级预测。我们的评估包括现有文献在各个语言水平上的知名功能,例如基于语言建模,词性,语法分析树和浅层文本属性的功能,包括经典的可读性公式(例如Flesch-Kincaid等级水平)式。我们特别关注于话语特征,包括基于实体的密度,词汇链和核心推论推断以及从实体网格派生的特征的三个新颖特征集。我们通过交叉验证来评估和比较这些不同的特征集的准确性和均方误差。可以通过两种方式评估对不同语料库或域的泛化。首先,我们使用两个文本集及其手动简化的版本,来评估可读性评估工具对原始文本和简化文本的区分程度。其次,我们测量了由我们的工具预测的年级水平,专家对文本难度的评价以及从涉及轻度智力障碍的成年参与者的实验得出的估计潜在难度之间的相关性。这项工作的应用包括选择适合不同熟练程度的阅读材料,通过阅读难度对文档进行排名以及自动文档摘要和文本简化。

著录项

  • 作者

    Feng, Lijun.;

  • 作者单位

    City University of New York.;

  • 授予单位 City University of New York.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 204 p.
  • 总页数 204
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号