【24h】

Classifying easy-to-read texts without parsing

机译:无需解析即可对易于阅读的文本进行分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy. In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high-performing models tend to consist of the same parameters and if it is possible to find models that only use features not requiring parsing. We used a genetic algorithm to systematically optimize parameter sets of fixed sizes using accuracy of a Support Vector Machine classifier as fitness function. Our results show that it is possible to find models almost as good as the currently best models while omitting parsing based features.
机译:使用自动语言分析和机器学习(ML)进行文档分类已被证明是可读性评估的可行之路。可以训练最好的模型,以决定文本是否易于阅读,并且准确性很高。使用来自浅层,词汇,形态和句法分析的117个参数的模型,可以达到98.9%的准确性。在本文中,我们将通过参数优化创建的模型与整个模型的子集进行比较,以找出不同的高性能模型倾向于由相同参数组成的程度,以及是否有可能找到仅使用不需要解析的特征的模型。我们使用支持向量机分类器的准确性作为适应度函数,使用遗传算法来系统地优化固定大小的参数集。我们的结果表明,在省略基于特征的分析的同时,可以找到几乎与当前最佳模型一样好的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号