Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

机译：报告分数分布有所不同：用于序列标记的LSTM网络的性能研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches. We demonstrate for common sequence tagging tasks that the seed value for the random number generator can result in statistically significant (p < 10~(-4)) differences for state-of-the-art systems. For two recent systems for NER, we observe an absolute difference of one percentage point F_1-score depending on the selected seed value, making these systems perceived either as state-of-the-art or mediocre. Instead of publishing and reporting single performance scores, we propose to compare score distributions based on multiple executions. Based on the evaluation of 50.000 LSTM-networks for five sequence tagging tasks, we present network architectures that produce both superior performance as well as are more stable with respect to the remaining hyperparameters. The full experimental results are published in (Reimers and Gurevych, 2017). The implementation of our network is publicly available.

机译：在本文中，我们证明了报告单个性能得分不足以比较非确定性方法。对于常见的序列标记任务，我们证明了对于最新系统，随机数生成器的种子值可能会导致统计上显着的差异（p <10〜（-4））。对于最近的两个NER系统，我们观察到取决于所选种子值的绝对百分比F_1分数，这使这些系统被认为是最先进的或中等的。我们建议不发布和报告单个性能得分，而是建议基于多个执行比较得分分布。基于对5个序列标记任务的50.000 LSTM网络的评估，我们提出了网络体系结构，该体系结构可产生优异的性能，并且相对于其余超参数更稳定。完整的实验结果发表在（Reimers和Gurevych，2017）中。我们网络的实施是公开可用的。

著录项

来源
《Conference on empirical methods in natural language processing》|2017年|338-348|共11页
会议地点
作者
Nils Reimers; Iryna Gurevych;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 14:32:41

相似文献

外文文献
中文文献
专利

1. Follow-up score, change score or percentage change score for determining clinical important outcome following surgery? An observational study from the Norwegian registry for Spine surgery evaluating patient reported outcome measures in lumbar spinal stenosis and lumbar degenerative spondylolisthesis [J] . Ivar Magne Austevoll, Rolf Gjestad, Margreth Grotle, BMC Musculoskeletal Disorders . 2019,第1期

机译：在手术后确定临床重要结果的后续分数，改变得分或百分比变化分数？鼻腔手术评估患者报告腰椎狭窄和腰椎退行性脊椎细胞的观察措施的观察研究
2. A qualitative study of patient and clinician perspectives on item importance, scoring preferences, and clinically important differences for two patient-reported outcome measures: endometriosis Symptom Diary (ESD) and Endometriosis Impact Scale (EIS) [J] . Kitchen Helen, Haberland Claudia, Trigg Andrew, Quality of life research: An international journal of quality of life aspects of treatment, care and rehabilitation . 2018,第Suppla1期

机译：对患者和临床医生对项目重要性，评分偏好以及两种患者报告的结果措施的临床重要差异的定性研究：子宫内膜异位症症状日记（ESD）和子宫内膜异位症的影响量表（EIS）
3. A qualitative study of patient and clinician perspectives on item importance, scoring preferences, and clinically important differences for two patient-reported outcome measures: endometriosis Symptom Diary (ESD) and Endometriosis Impact Scale (EIS) [J] . Kitchen Helen, Haberland Claudia, Trigg Andrew, Quality of life research: An international journal of quality of life aspects of treatment, care and rehabilitation . 2018,第Suppla1期

机译：对患者和临床医生对项目重要性，评分偏好以及两种患者报告的结果措施的临床重要差异的定性研究：子宫内膜异位症症状日记（ESD）和子宫内膜异位症的影响量表（EIS）
4. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging [C] . Nils Reimers, Iryna Gurevych Conference on empirical methods in natural language processing . 2017

机译：报告分量分发具有差异：LSTM网络的性能研究序列标记
5. A simulation study on the performance of the simple difference and covariance adjusted scores in randomized experimental designs [D] . Petscher, Yaacov 2009

机译：随机实验设计中简单差异和协方差调整分数的性能模拟研究
6. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies [O] . Peter C Austin -1

机译：在观察性研究中使用不同倾向评分方法估算比例差异（风险差异或绝对风险降低）的性能
7. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging [O] . Reimers, Nils, Gurevych, Iryna 2017

机译：报告得分分布产生差异：绩效研究用于序列标记的LsTm网络
8. Study of Crack Front Distribution During Crack Propagation Stage in High Performance Alloys [R] . Ghonem, H. 1984

机译：高性能合金裂纹扩展阶段裂纹前沿分布研究

Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging

摘要

著录项

相似文献

相关主题

期刊订阅