首页> 外文会议>International Conference on Text, Speech and Dialogue >A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese
【24h】

A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

机译:一种轻量级回归方法,推断巴西葡萄牙语的精神语言学特性

获取原文

摘要

Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.
机译:单词的精神语言学属性已被用于自然语言处理任务的各种方法,例如文本简化和可读性评估。这些属性中的大部分是主观的,涉及收集昂贵和耗时的调查。最近的方法使用有限的心理学属性数据集自动将它们延伸到大型词汇。但是,这些方法使用的一些资源不适用于大多数语言。这项研究提出了一种方法来推断使用了灯组的一般可用较少的资源语言功能,内置回归系数为巴西葡萄牙语(BP)心理语言学特性:字长,频率表,学校词典和语言模型嵌入组成的词汇数据库。推断的特性之间的相关性接近通过相关工程获得的那些。由此产生的资源包含26,874个单词,以具体,获取,想象力和主观频率的具体性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号