【24h】

ECNU: Using Traditional Similarity Measurements and Word Embedding for Semantic Textual Similarity Estimation

机译:ECNU:使用传统的相似性度量和词嵌入进行语义文本相似性估计

获取原文

摘要

This paper reports our submissions to semantic textual similarity task, i.e., task 2 in Semantic Evaluation 2015. We built our systems using various traditional features, such as string-based, corpus-based and syntactic similarity metrics, as well as novel similarity measures based on distributed word representations, which were trained using deep learning paradigms. Since the training and test datasets consist of instances collected from various domains, three different strategies of the usage of training datasets were explored: (1) use all available training datasets and build a unified supervised model for all test datasets; (2) select the most similar training dataset and separately construct a individual model for each test set; (3) adopt multi-task learning framework to make full use of available training set-s. Results on the test datasets show that using all datasets as training set achieves the best averaged performance and our best system ranks 15 out of 73.
机译:本文报告了我们提交给语义文本相似性任务的提交,即“语义评估2015”中的任务2。我们使用各种传统功能(例如基于字符串,基于语料库和句法相似性度量,以及基于新颖相似性度量)构建了我们的系统分布式词表示法,这些词法是使用深度学习范式进行训练的。由于训练和测试数据集由从各个领域收集的实例组成,因此探索了使用训练数据集的三种不同策略:(1)使用所有可用的训练数据集,并为所有测试数据集建立统一的监督模型; (2)选择最相似的训练数据集,并为每个测试集分别构建一个单独的模型; (3)采用多任务学习框架,以充分利用可用的培训集。测试数据集上的结果表明,将所有数据集用作训练集都可以实现最佳的平均性能,而我们的最佳系统在73个系统中排名15。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号