首页> 外文会议>International Conference on Computational Linguistics >Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches

Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches




Sentence complexity assessment is a relatively new task in Natural Language Processing. One of its aims is to highlight in a text which sentences are more complex to support the simplification of contents for a target audience (e.g., children, cognitively impaired users, non-native speakers and low-literacy readers (Scarton and Specia, 2018)). This task is evaluated using datasets of pairs of aligned sentences including the complex and simple version of the same sentence. For Brazilian Portuguese, the task was addressed by (Leal ct al., 2018), who set up the first datasct to evaluate the task in this language, reaching 87.8% of accuracy with linguistic features. The present work advances these results, using models inspired by (Gonzalez-Gardufio and S0gaard, 2018), which hold the state-of-the-art for the English language, with multi-task learning and eye-tracking measures. First-Pass Duration, Total Regression Duration and Total Fixation Duration were used in two moments; first to select a subset of linguistic features and then as an auxiliary task in the multi-task and sequential learning models. The best model proposed here reaches the new state-of-the-art for Portuguese with 97.5% accuracy', an increase of almost 10 points compared to the best previous results, in addition to proposing improvements in the public dataset after analysing the errors of our best model.
机译:句子复杂性评估是自然语言处理中相对较新的任务。其中一个目标是在文本中突出显示哪些句子更复杂,以支持目标受众的内容(例如,儿童,认知障碍用户,非母语扬声器和低识字读者(Scarton和Specia,2018) )。使用包括相同句子的复杂和简单版本的对齐句子的数据集来评估此任务。对于巴西葡萄牙语来说,这项任务由(Leal CT al。,2018)解决,他们设置了第一个DataSct来评估这种语言的任务,达到了语言特征的准确性的87.8%。目前的工作介绍了这些结果,采用了由(Gonzalez-Gardufio和S0Gaard,2018)的模型,该结果具有持有最先进的英语,具有多任务学习和追踪措施。首先持续时间,总回归持续时间和总固定持续时间在两个时刻使用;首先要选择语言特征的子集,然后作为多任务和顺序学习模型中的辅助任务。此处提出的最佳型号达到了葡萄牙语的新型,精度为97.5%',与最佳先前的结果相比,近10分的增加,除了在分析错误之后提出公共数据集的改进我们最好的模特。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号