首页> 外文会议>International conference on computational linguistics >Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis
【24h】

Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis

机译:将复制和复制以及NLP的通用性结合在一起:用于目标依赖性情感分析的三项复制研究

获取原文

摘要

Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing. Language models and learning methods are so complex that scientific conference papers no longer contain enough space for the technical depth required for replication or reproduction. Taking Target Dependent Sentiment Analysis as a case study, we show how recent work in the field has not consistently released code, or described settings for learning methods in enough detail, and lacks comparability and generalisability in train, test or validation data. To investigate generalisabilily and to enable state of the art comparative evaluations, we carry out the first reproduction studies of three groups of complementary methods and perform the first large-scale mass evaluation on six different English datascts. Reflecting on our experiences, we recommend that future replication or reproduction experiments should always consider a variety of datasets alongside documenting and releasing their methods and published code in order to minimise the barriers to both repeatability and generalisability. We have released our code with a model zoo on GitHub with Jupyter Notebooks to aid understanding and full documentation, and we recommend that others do the same with their papers at submission time through an anonymised GitHub account.
机译:缺乏可重复性和通用性是自然语言处理中持续科学发展的两个重大威胁。语言模型和学习方法是如此复杂,以至于科学会议论文不再为复制或复制所需的技术深度提供足够的空间。以目标依赖情感分析为例,我们说明了该领域的最新工作是如何不一致地发布代码,或者没有足够详细地描述学习方法的设置,并且在训练,测试或验证数据中缺乏可比性和通用性。为了进行一般性调查并进行最先进的比较评估,我们对三组互补方法进行了首次复制研究,并对六种不同的英语数据进行了首次大规模评估。考虑到我们的经验,我们建议将来的复制或复制实验应始终考虑各种数据集,并记录和发布其方法和已发布的代码,以最大程度地减少重复性和通用性方面的障碍。我们已经在GitHub上的模型动物园中使用Jupyter Notebooks发布了我们的代码,以帮助理解和提供完整的文档,我们建议其他人在匿名时通过匿名GitHub帐户在提交时对其论文进行同样的处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号