【24h】

TALN at SemEval-2016 Task 11: Modelling Complex Words by Contextual, Lexical and Semantic Features

机译:TALN在SemEval-2016任务11:通过上下文,词汇和语义特征对复杂的单词建模

获取原文

摘要

This paper presents the participation of the TALN team in the Complex Word Identification Task of SemEval-2016 (Task 11). The purpose of the task was to determine if a word in a given sentence can be judged as complex or not by a certain target audience. To experiment with word complexity identification approaches, Task organizers provided a training set of 2,237 words judged as complex or not by 20 human evaluators, together with the sentence in which each word occurs. In our contribution we modelled each word to evaluate as a numeric vector populated with a set of lexical, semantic and contextual features that may help assess the complexity of a word. We trained a Random Forest classifier to automatically decide if each word is complex or not. We submitted two runs in which we respectively considered unweighted and weighted instances of complex words to train our classifier, where the weight of each instance is proportional to the number of evaluators that judged the word as complex. Our system scored as the third best performing one.
机译:本文介绍了TALN团队对SemEval-2016的复杂单词识别任务(任务11)的参与。该任务的目的是确定给定句子中的某个单词是否可以由特定的目标受众判断为复杂。为了试验单词复杂度识别方法,任务组织者提供了一个训练集,其中包含由20位人工评估者判断为复杂与否的2,237个单词,以及每个单词出现的句子。在我们的贡献中,我们对每个单词进行建模,以评估为一个数字矢量,其中填充了一组词汇,语义和上下文特征,可帮助评估单词的复杂性。我们训练了一个随机森林分类器来自动确定每个单词是否复杂。我们提交了两次运行,其中我们分别考虑了复杂单词的未加权实例和加权实例,以训练我们的分类器,其中每个实例的权重与判断该单词为复杂词的评估者的数量成正比。我们的系统得分排名第三。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号