首页> 外文会议>International conference on computational linguistics >A Computer Readability Formula of Japanese Texts for Machine Scoring
【24h】

A Computer Readability Formula of Japanese Texts for Machine Scoring

机译:用于机器评分的日语文本的计算机可读性公式

获取原文

摘要

A readability formula is obtained that can be used by computer programs for style checking of Japanese texts and need not syntactic or semantic information. The formula is derived as a linear combination of the surface characteristics of the text that are related to its readability: (1) the average number of characters per sentence, (2) for each type of characters (Roman alphabets, kanzis, hiraganas, katakanas), relative frequencies of runs (maximal strings) that consists only of that type of characters, (3) the average number of characters per each type of runs, and (4) tooten (comma) to kuten (period) ratio.To find the proper weighting, principal component analysis (PCA) was applied to these characteristics taken from 77 sample texts.We have found a component which is related to the readability. Its scores match to the empirical knowledges of reading ease. We have also obtained experimental confirmation that the component is an adequate measure for stylistic ease of reading, by the cloze procedure and by the examination on the average time taken to fill out one blank of the cloze texts.
机译:获得了可读性公式,计算机程序可以使用该可读性公式对日语文本进行样式检查,而无需语法或语义信息。该公式是由与其可读性相关的文本表面特征的线性组合得出的:(1)每个句子的平均字符数,(2)每种字符类型(罗马字母,kanzis,hiraganas,katakanas) ),则运行次数(最大字符串)的相对频率仅由该类型的字符组成,(3)每种运行方式的平均字符数,以及(4)折腾(逗号)与kuten(句点)的比率。 为了找到适当的权重,对来自77个样本文本的这些特征进行了主成分分析(PCA)。 我们发现了一个与可读性有关的组件。它的分数与阅读缓解的经验知识相匹配。我们还获得了实验确认,该组件通过完形填空程序以及通过检查填入完形填空文本的平均时间所花费的平均时间,是一种足以使文体易于阅读的适当措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号