Automatic Coding of Text Answers to Open-Ended Questions: Should You Double Code the Training Data?

He Zhoushanyue; Schonlau Matthias

首页> 外文期刊>Social science computer review >Automatic Coding of Text Answers to Open-Ended Questions: Should You Double Code the Training Data?

【24h】

Automatic Coding of Text Answers to Open-Ended Questions: Should You Double Code the Training Data?

机译：自动编码文本答案到开放式问题：如果您将培训数据重新编码培训数据？

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Open-ended questions in surveys are often manually coded into one of several classes (or categories). When the data are too large to manually code all texts, a statistical (or machine) learning model must be trained on a manually coded subset of texts. Uncoded texts are then coded automatically using the trained model. The quality of automatic coding depends on the trained statistical model, and the model relies on manually coded data on which it is trained. While survey scientists are acutely aware that the manual coding is not always accurate, it is not clear how double coding affects the classification errors of the statistical learning model. We investigate several budget allocation strategies when there is a limited budget for manual classification: single coding versus various options for double coding where the number of training texts is reduced to maintain the fixed budget. Under fixed budget, double coding improved prediction of the learning algorithm when the coding error is greater than about 20-35%, depending on the data. Among double-coding strategies, paying for an expert to resolve differences performed best. When no expert is available, removing differences from the training data outperformed other double-coding strategies. When there is no budget constraint and the texts have already been double coded, all double-coding strategies generally outperformed single coding. As under fixed budget, having an expert to solve disagreement in training texts improves accuracy most, followed by removing differences.

机译：调查中的开放式问题通常是手动编码为几个类（或类别）之一。当数据太大而无法手动代码所有文本时，必须在手动编码的文本子集上培训统计（或机器）学习模型。然后使用培训的模型自动编码未编码的文本。自动编码的质量取决于训练有素的统计模型，并且该模型依赖于培训的手动编码数据。虽然调查科学家敏锐意识到手动编码并不总是准确，但目前尚不清楚双重编码如何影响统计学习模型的分类错误。当手动分类预算有限时，我们调查几项预算分配策略：单一编码与双重编码的各种选项，其中培训文本的数量减少以维持固定预算。根据固定预算，根据数据，当编码误差大于约20-35％时，双编码提高了学习算法的预测。在双重编码策略中，支付专家以解决最佳差异。当没有专家时，从培训数据中删除差异优于其他双重编码策略。当没有预算约束并且文本已经编码了双重编码时，所有双重编码策略通常都比单一编码总是表现优势。根据固定预算，拥有专家解决培训文本中的分歧，提高了最精确的，然后消除了差异。

著录项

来源
《Social science computer review》 |2020年第6期|754-765|共12页
作者
He Zhoushanyue; Schonlau Matthias;
展开▼
作者单位

Univ Waterloo Dept Stat & Actuarial Sci Waterloo ON Canada;

Univ Waterloo Dept Stat & Actuarial Sci Stat Waterloo ON Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
double coding; statistical learning; machine learning; open-ended questions; manual coding; text classification; human coder;

机译：双重编码;统计学习;机器学习;开放式问题;手动编码;文本分类;人类编码器;
入库时间 2022-08-18 21:23:21

相似文献

外文文献
中文文献
专利

1. Resolving coding questions. Where to find answers to coding questions. [J] . Kostick KM Journal of AHIMA . 2010,第4期

机译：解决编码问题。在哪里可以找到编码问题的答案。
2. Attention-based encoder-decoder model for answer selection in question answering [J] . Yuan-ping?Nie, Yi?Han, Jiu-ming?Huang, Frontiers of Information Technology & Electronic Engineering . 2017,第4期

机译：基于注意力的编解码器模型，用于问题回答中的答案选择
3. Attention-based encoder-decoder model for answer selection in question answering [J] . Yuan-ping NIE, Yi HAN, Jiu-ming HUANG, 浙江大学学报（英文版）（C辑：计算机与电子） . 2017,第004期

机译：基于注意力的编解码器模型，用于问题回答中的答案选择
4. Open-Domain Why-Question Answering with Adversarial Learning to Encode Answer Texts [C] . Jong-Hoon Oh, Kazuma Kadowaki, Julien Kloetzer, Annual meeting of the Association for Computational Linguistics . 2019

机译：对抗性学习的开放域问题解答，对答案文本进行编码
5. Prototype micro-electronique d'un decodeur iteratif pour des codes doublement orthogonaux (French text). [D] . Ouadid, Abdelkarim. 2005

机译：双正交码（法语文本）的迭代解码器的微电子原型。
6. Combining Structured and Free-text Data for Automatic Coding of Patient Outcomes [O] . Suchi Saria, Gayle McElvain, Anand K. Rajani, 2010

机译：结合结构化和自由文本数据以自动编码患者结果
7. Open-Domain Why-Question Answering with Adversarial Learning to Encode Answer Texts [O] . Jong-Hoon Oh, Kazuma Kadowaki, Julien Kloetzer, 2019

机译：开放域为什么用反对派学习回答编码答案文本的回答
8. Guidance for Industry: Bar Code Label Requirements, Questions and Answers. Revision 1 [R] . 2006

机译：行业指南：条形码标签要求，问题和解答。修订版1

Automatic Coding of Text Answers to Open-Ended Questions: Should You Double Code the Training Data?

摘要

著录项

相似文献

相关主题

期刊订阅