Automated Summary Scoring with ReaderBench

机译：使用ReaderBench自动汇总评分

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text summarization is an effective reading comprehension strategy. However, summary evaluation is complex and must account for various factors including the summary and the reference text. This study examines a corpus of approximately 3,000 summaries based on 87 reference texts, with each summary being manually scored on a 4-point Likert scale. Machine learning models leveraging Natural Language Processing (NLP) techniques were trained to predict the extent to which summaries capture the main idea of the target text. The NLP models combined both domain and language independent textual complexity indices from the ReaderBench framework, as well as state-of-the-art language models and deep learning architectures to provide semantic contextualization. The models achieve low errors - normalized MAE ranging from 0.13-0.17 with corresponding R~2 values of up to 0.46. Our approach consistently outperforms baselines that use TF-IDF vectors and linear models, as well as Transfomer-based regression using BERT. These results indicate that NLP algorithms that combine linguistic and semantic indices are accurate and robust, while ensuring generalizability to a wide array of topics.

机译：文本摘要是一种有效的阅读理解策略。然而，总结评估是复杂的，必须考虑各种因素，包括总结和参考文本。本研究以87篇参考文献为基础，对约3000篇摘要进行了语料库分析，每一篇摘要都在利克特4分量表上手工打分。机器学习模型利用自然语言处理（NLP）技术进行训练，以预测摘要在多大程度上捕获了目标文本的主要思想。NLP模型结合了ReaderBench框架中与领域和语言无关的文本复杂性指数，以及最先进的语言模型和深度学习架构，以提供语义语境化。该模型实现了低误差——归一化MAE范围为0.13-0.17，相应的R~2值高达0.46。我们的方法始终优于使用TF-IDF向量和线性模型的基线，以及使用BERT的基于变换器的回归。这些结果表明，结合了语言和语义索引的NLP算法是准确和健壮的，同时确保了广泛主题的可推广性。

著录项

来源
《International Conference on Intelligent Tutoring Systems》|2021年|321-332|共12页
会议地点
作者
Robert-Mihai Botarleanu; Mihai Dascalu; Laura K. Allen; Scott Andrew Crossley; Danielle S. McNamara;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Natural language processing; Text summarization; Automated scoring;

机译：自然语言处理;文本摘要;自动计分;

相似文献

外文文献
中文文献
专利

1. ReaderBench: Automated evaluation of collaboration based on cohesion and dialogism [J] . Dascalu Mihai, Trausan-Matu Stefan, McNamara Danielle S., International journal of computer-supported collaborative learning . 2015,第4期

机译：ReaderBench：基于凝聚力和对话性的自动化协作评估
2. s): All participants completed the SF-36 as a measure of HR-QOL and the Foot and Ankle Ability Measure (FAAM) and the FAAM Sport version (FAAMS) as assessments of functional limitation. To compare the FAI and Ul groups, we calculated multiple analyses of variance followed by univariate tests. Additionally, we correlated the SF-36 summary component scale and domain scales with the FAAM and FAAMS scores. Results: Participants with FAI had lower scores on the SF-36 physical component summary (FA... [J] . Brent L. Arnold, Cynthia J. Wright, IVSEd, Journal of athletic training . 2011,第6期

机译：s）：所有参与者均完成了SF-36的测量，作为HR-QOL的量度，以及足踝能力测量（FAAM）和FAAM Sport版本（FAAMS），以评估功能受限。为了比较FAI和Ul组，我们计算了方差的多次分析，然后进行了单变量检验。此外，我们将SF-36摘要成分量表和领域量表与FAAM和FAAMS分数相关联。结果：FAI参与者在SF-36身体成分摘要（FA ...
3. The SF-36 summary scores and their relation to mental disorders: physical functioning may affect performance of the summary scores. [J] . Schmitz N, Kruse J Journal of Clinical Epidemiology . 2007,第2期

机译：SF-36简易评分及其与精神障碍的关系：身体机能可能影响简易评分的表现。
4. ReaderBench Learns Dutch: Building a Comprehensive Automated Essay Scoring System for Dutch Language [C] . Mihai Dascalu, Wim Westera, Stefan Ruseti, International Conference on Artificial Intelligence in Education . 2017

机译：Readerbench学习荷兰语：为荷兰语建立一个全面的自动化论文评分系统
5. An evaluation of automated scoring programs designed to score essays [D] . Khaliq, Shameem Nyla 2004

机译：对旨在评分文章的自动评分程序的评估
6. Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. [O] . C. Friedman, C. Knirsch, L. Shagina, 1999

机译：使用出院摘要的医学语言处理功能自动处理社区获得性肺炎的严重程度评分指南。
7. Automated Sleep Scoring with Human Supervision Adds Value Compared with Human Scoring Alone: A reply to Zammit G. K. Insufficient evidence for the use of automated and semi-automated scoring of polysomnographic recordings. SLEEP 2008:31;449–50 [O] . Vladimir Svetnik, Junshui Ma, Keith A. Soper, 2008

机译：与人类监督的自动睡眠评分增加了与人类得分相比的价值：对Zammit G. K的回复。使用自动和半自动化的多仪表录制的自动化和半自动评分的证据不足。睡眠2008：31; 449-50
8. Summary Variables in Observational Research: Propensity Scores and Disease Risk Scores. [R] . Arbogast, P. G., Seeger, J. D. 2012

机译：观察研究中的总结变量：倾向得分和疾病风险评分。

Automated Summary Scoring with ReaderBench

摘要

著录项

相似文献

相关主题

期刊订阅