Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation

机译：最差的规模比评级量表更可靠：以情感强度注释为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Rating scales are a widely used method for data annotation; however, they present several challenges, such as difficulty in maintaining inter- and intra-annotator consistency. Best-worst scaling (BWS) is an alternative method of annotation that is claimed to produce high-quality annotations while keeping the required number of annotations similar to that of rating scales. However, the veracity of this claim has never been systematically established. Here for the first time, we set up an experiment that directly compares the rating scale method with BWS. We show that with the same total number of annotations, BWS produces significantly more reliable results than the rating scale.

机译：评定量表是一种广泛使用的数据注释方法。然而，它们提出了一些挑战，例如难以保持注释者之间和注释者内部的一致性。最差标定（BWS）是注解的一种替代方法，据称可产生高质量注解，同时保持所需的注解数量与等级量表相似。但是，此主张的准确性从未得到系统地确定。在这里，我们首次建立了一个直接将评分量表方法与BWS进行比较的实验。我们显示，在批注总数相同的情况下，BWS所产生的结果要比评级量表可靠得多。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2017年|465-470|共6页
会议地点
作者
Svetlana Kiritchenko; Saif M. Mohammad;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:49:42

相似文献

外文文献
中文文献
专利

1. Studies comparing Numerical Rating Scales, Verbal Rating Scales, and Visual Analogue Scales for assessment of pain intensity in adults: a systematic literature review. [J] . Hjermstad MJ, Fayers PM, Haugen DF, Journal of pain and symptom management. . 2011,第6期

机译：比较数字量表，口头量表和视觉类比量表以评估成人疼痛强度的研究：系统文献综述。
2. Studies Comparing Numerical Rating Scales, Verbal Rating Scales, and Visual Analogue Scales for Assessment of Pain Intensity in Adults: A Systematic Literature Review [J] . Marianne Jensen Hjermstad, Peter M. Fayers, Dagny F. Haugen, Journal of pain and symptom management. . 2011,第6期

机译：比较数字评分量表，言语评分量表和视觉类比量表以评估成人疼痛强度的研究：系统文献综述
3. Comparative study of verbal rating scale and numerical rating scale to assess postoperative pain intensity in the post anesthesia care unit [J] . Lee Ho-Jin, Cho Yongjung, Joo Hyundeok, Medicine. . 2021,第6期

机译：语言评级规模和数值评级规模的比较研究评估麻醉后护理单位术后疼痛强度
4. Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation [C] . Svetlana Kiritchenko, Saif M. Mohammad Annual meeting of the Association for Computational Linguistics . 2017

机译：比评级尺度更可靠的最佳缩放：对情绪强度注释的案例研究
5. Cultural Adaptation of NIH’s PROMIS-29 Profile with Black MSM Living with HIV/AEDS in the US: A Mixed-Methods Pilot-Study Using the Rasch Rating Scale Model [D] . ?Wilson, Brandon G. 2020

机译：NIH的Promis-29型材与美国艾滋病毒/ AEDs的黑人MSM概况的文化适应：使用Rasch评级规模模型的混合方法试验研究
6. A Comparison of Change in the 0–10 Numeric Rating Scale to a Pain Relief Scale and Global Medication Performance Scale in a Short-term Clinical Trial of Breakthrough Pain Intensity [O] . John T. Farrar, Rosemary C. Polomano, Jesse A. Berlin, -1

机译：变化的比较在0-10数量评定量表在突发疼痛强度的短期临床试验止痛规模和全球用药操作量表
7. Best-worst scaling more reliable than rating scales: a case study on sentiment intensity annotation [O] . Kiritchenko, Svetlana, Mohammad, Saif 2017

机译：最差评估比评分评估更可靠：以情绪强度注释为例

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation

摘要

著录项

相似文献

相关主题

期刊订阅