IS AI GROUND TRUTH REALLY TRUE? THE DANGERS OF TRAINING AND EVALUATING AI TOOLS BASED ON EXPERTS’ KNOW-WHAT

Lebovitz Sarah; Levine Natalia; Lifshitz-Assaf Hila

首页> 外文期刊>MIS quarterly >IS AI GROUND TRUTH REALLY TRUE? THE DANGERS OF TRAINING AND EVALUATING AI TOOLS BASED ON EXPERTS’ KNOW-WHAT

【24h】

IS AI GROUND TRUTH REALLY TRUE? THE DANGERS OF TRAINING AND EVALUATING AI TOOLS BASED ON EXPERTS’ KNOW-WHAT

机译：是真的真的真的吗？基于专家了解的培训和评估AI工具的危险

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools outperform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts' know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI's know-what and experts' know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

机译：组织决策者需要根据增加的索赔来评估AI工具，因为这些工具赢得了人类专家的表现。然而，衡量知识工作质量有挑战性，提出了如何在这种情况下评估AI性能的问题。我们通过对美国专业医院的实地研究调查了这个问题，观察了经理如何评估了五种不同的机器学习（ML）的AI工具。每个工具根据标准AI精度措施报告了高性能，基于合格专家提供的地面真理标签。然而，在实践中尝试这些工具透露，他们都没有满足期望。在寻找解释，管理者开始面对专家的高度不确定性，专业知识 - 在地上真理标签中捕获的知识，用于培训和验证ML模型。在实践中，专家通过绘制丰富的专业知识实践来解决这种不确定性，这些专业知识实践未被纳入这些基于ML的工具。发现AI的诀窍和专家的诀窍与专家的诀窍与能够更好地了解每个工具的风险和优势之间的断开连接。本研究表明，当潜在的知识不确定时，客观地处理ML模型中使用的地面真理标签的危险。我们概述了我们对发展，培训和评估知识工作的AI的研究的影响。

著录项

来源
《MIS quarterly》 |2021年第3期|1501-1525|共26页
作者
Lebovitz Sarah; Levine Natalia; Lifshitz-Assaf Hila;
展开▼
作者单位

Univ Virginia McIntire Sch Commerce Charlottesville VA 22904 USA;

NYU Stern Sch Business New York NY 10003 USA;

NYU Stern Sch Business New York NY 10003 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Artificial intelligence; evaluationnbsp; uncertaintynbsp; new technologynbsp; professionalnbsp; knowledge; work innovationnbsp; know-hownbsp; medical diagnosis; ground truth;

机译：人工智能;评估＆nbsp;不确定性＆nbsp;新技术＆nbsp;专业＆nbsp;知识;工作创新＆nbsp;专门知识＆nbsp;医学诊断;实践;

相似文献

外文文献
中文文献
专利

1. AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes [J] . Mohammadhossein Ghahramani, Yan Qiao, MengChu Zhou, 自动化学报（英文版） . 2020,第004期
2. AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes [J] . Mohammadhossein Ghahramani, Yan Qiao, Meng Chu Zhou, 自动化学报：英文版 . 2020,第004期
3. Semi-automatic Video Annotation Tool to Generate Ground Truth for Intelligent Video Surveillance Systems [J] . Ryu-Hyeok Gwon, Jin-Tak Park, Hakil Kim, 电气工程：英文版 . 2014,第004期
4. An expert system development tool for non AI experts [J] . B. Ruiz-Mezcua, A. Garcia-Crespo, J.L. Lopez-Cuadrado, Expert systems with applications . 2011,第1期

机译：非AI专家的专家系统开发工具
5. Organic and dynamic tool for use with knowledge base of AI ethics for promoting engineers' practice of ethical AI design [J] . Sekiguchi Kaira, Hori Koichi Trends in Ecology & Evolution . 2020,第1期

机译：用于AI道德知识库的有机和动态工具，用于促进工程师的道德AI设计的实践
6. Risk of persistent or recurrent cervical neoplasia in patients with ‘pure’ adenocarcinoma‐in‐situ ( AIS AIS ) or mixed AIS AIS and high‐grade cervical squamous neoplasia (cervical intra‐epithelial neoplasia grades 2 and 3 ( CIN CIN 2/3)): a population‐based study [J] . Codde E, Munro A, Stewart CJR, BJOG: an international journal of obstetrics and gynaecology . 2018,第1期

机译：“纯”腺癌原位（AIS AIS）或混合AIS AIS和高级宫颈鳞状肿瘤患者患者持续或复发性宫颈瘤形成（颈椎内皮肿瘤患者2和3级（CIN CIN 2/3））：基于人口的研究
7. Something New Versus Tried and True: Ensuring 'Innovative' AI is 'Good' AI [C] . Stephen C. Slota, Kenneth R. Fleischmann, Sherri Greenberg, International Conference on Information . 2021

机译：新的与验证和真实的东西：确保'创新'ai是'好'ai
8. AI4IO: A Suite of AI-Based Tools for IO-Aware HPC Resource Management [D] . Wyatt, Michael R., II. 2020

机译：AI4IO：一套基于AI的IO-Aware HPC资源管理工具
9. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence [O] . Gary S Collins, Paula Dhiman, Constanza L Andaur Navarro, 2021

机译：发展报告指南（三脚架-AI）的协议以及基于人工智能的诊断和预测预测模型研究的偏置工具（探测器-AI）的风险
10. La banalisation du luxe (Democratization of luxury) Abstract : La plupart des achats pratiqués sont effectués par des clients occasionnels, ce qui a conduit les entreprises à s’adapter par une nouvelle stratégie. En effet, suite à l’intensification de la concurrence et à la concentration du secteur qui engendrent une contrainte de rentabilité immédiate, mais aussi en raison des impératifs nouveaux du marché, les maisons de luxe se voient obligées d’élargir leur offre afin d’ajouter à une clientèle dite traditionnelle une clientèle plus vaste. Dès lors, le secteur du luxe est progressivement passé d’une logique d’offre où seul un nombre restreint de personnes était ciblé, à une politique d’offre où des professionnels du marketing étudient la demande du marché afin d’orienter la production des biens vers un marché de masse. La banalisation est une problématique capitale et décisive car les maisons de luxe doivent préserver leur image de marque tout en élargissant leur clientèle : elles s’efforcent de créer et de pérenniser leurs marques, sans jamais oublier qu’une marque ne peut pas s’associer à n’importe quel objet, sous peine de menacer l’ensemble de ses représentations. Par conséquent, chaque nouveau produit présente un risque pour la gamme toute entière, d’où le risque d’une erreur stratégique par le choix de la banalisation. Le danger est de voir la clientèle aisée se tourner vers d’autres grands noms pratiquant toujours cet esprit d’élitisme qui caractérise le « luxe ». The luxury market is no longer reserved for an elite as its evolution over the last fifteen years clearly indicates. Luxury goods companies have been forced to adapt and resort to new strategies to take into account the fact that most purchases are now made by occasional clients. The keen competition and the on-going concentration in that sector – with the resulting short term profitability constraints - together with the new market conditions, have forced luxury goods companies to broaden their offer so as to add new customers to their traditional base. As a consequence, the luxury market has progressively moved away from an offer-driven logic – targeting a small number of people – in favor of an offer-based policy with marketing professionals studying market demand so as to direct the production of goods towards mass production. The democratization of luxury constitutes a major challenge for those companies which must preserve their image while broadening their customer base: they now strive to create and perpetuate their brand image, without ever forgetting that a brand cannot be associated to just any object, as this might constitute a threat to all its brand representations. Thus every new product constitutes a real threat to the whole range and there is a risk of making a strategic mistake by appealing to the mass market; and there is also a clear danger of seeing affluent customers turn to other great names that still foster this ‘elite spirit’ that characterizes « luxury ». [O] . Eric Vernier, Pierre Ghewy 100

机译：奢侈品的平庸化（奢侈品民主化）摘要：大多数购买都是由休闲客户制作的，这导致公司采用新战略。事实上，随着竞争的加剧和行业的集中导致了盈利的直接制约，而且由于市场的新要求，奢侈品公司被迫扩大其报价以便为所谓的传统客户增加一个更大的客户群。从那时起，奢侈品行业逐渐从以供应为导向的方式转变为只有少数人被定为供应方政策，营销人员正在研究市场需求以指导产品生产。货物进入大众市场。平庸化是一个至关重要且决定性的问题，因为豪宅必须保持其品牌形象，同时扩大其客户群：他们努力创造和延续其品牌，永远不要忘记一个品牌无法联想到在任何威胁他所有陈述的痛苦之下。因此，每个新产品都会带来整个色域的风险，因此通过选择平凡化就会产生战略错误的风险。危险的是看到富有的顾客转向其他仍在实践“豪华”特征的精英主义精神。奢侈品市场不会超过过去15年。奢侈品公司被迫采用新策略来充分利用其客户。由于市场领先的条件，激烈的竞争和该行业的持续集中迫使他们扩大了市场基础。因此，奢侈品市场已经从针对少数人的以报价驱动的逻辑向前发展 - 转而采用基于报价的政策与营销专业人士。对于那些必须在扩大客户群的同时保持形象的公司来说，奢侈品的民主化是一项重大挑战。对其所有品牌代表构成威胁。因此，每一种新产品都会对市场构成威胁，并且有可能通过吸引大众市场来制造战略错误;看到人们涌向那种“奢侈”的精神精神，也存在明显的危险。
11. Topical Hazard Evaluation Program, Assessment of the Relative Toxicity of Candidate Insect Repellants, AI3-36465, AI3-37410, AI3-37414, AI3-37416, AI3-37577, AI3-38661, AI3-54169, AI3-54170, AI3-20698, AI3-38142, AI3-39672. U.S. Department of Agriculture Proprietary Chemicals Study Nos. 75-51-0797-92 through 75-51-0804-92, and 75-51-0828-92 through 75-51-0830-92. [R] . 1992

机译：局部危害评估计划，候选昆虫驱虫剂的相对毒性评估，aI3-36465，aI3-37410，aI3-37414，aI3-37416，aI3-37577，aI3-38661，aI3-54169，aI3-54170，aI3-20698 ，aI3-38142，aI3-39672。美国农业部专有化学品研究编号75-51-0797-92至75-51-0804-92，和75-51-0828-92至75-51-0830-92。

IS AI GROUND TRUTH REALLY TRUE? THE DANGERS OF TRAINING AND EVALUATING AI TOOLS BASED ON EXPERTS’ KNOW-WHAT

摘要

著录项

相似文献

相关主题

期刊订阅