Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

Shoko Wakamiya; Mizuki Morita; Yoshinobu Kano; Tomoko Ohkuma; Eiji Aramaki

首页> 外文期刊>Journal of medical Internet research >Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

【24h】

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

机译：推文分类对基于Twitter的疾病监视：新数据，方法和评估

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

BackgroundThe amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not.ObjectiveThis study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP.MethodsIn , 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss.ResultsThe best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively.ConclusionsThis paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.

机译：背景技术网上的医疗和临床相关信息的数量正在增加。在可用的不同类型的信息中，直接从人们获得的基于社交媒体的数据是特别有价值的，并且正在引起重大关注。为了鼓励医疗自然语言处理（NLP）研究利用社交媒体数据，第13届NII测试台和社区用于信息访问研究（NTCIR-13）Web文档的医学自然语言处理（MEDWeb）提供了跨语言的伪Twitter消息和多标签语料库，涵盖3种语言（日语，英语和中文），并用8个症状标签（如寒冷，发烧和流感）注释。然后，参与者将每个推文分类为2个类别中的1个：那些包含患者症状的人和那些没有的人.Objectivethis的研究旨在展示参与日本子任务，英语子任务和中国子系统以及讨论的团体结果以及讨论澄清在医疗NLP.Methodsin领域中需要解决的问题，参加日本子任务的8组（19个系统），4组（12个系统）参与英语子摊，2组（6个系统）参加中国子任务。总共为每个子任务构建了2个基线系统。通过基于精度和召回的精确匹配精度评估参与者和基线系统的性能，以及汉明损失。最佳系统的最佳系统达到0.880匹配精度，0.920 F措施和0.019锤击。匹配精度，F测量和日本子摊中的汉明损失的平均值为0.720,820和0.051;英语子摊的人数为0.770,850和0.037;和中国子购表的那些分别为0.810,0.880和0.032。结论纸张，并讨论了参与NTCIR-13 MedWeb任务的系统的性能。由于MEDWeb任务设置可以正式化为文本的事实化，因此可以将此任务的实现直接应用于实际临床应用。

著录项

来源
《Journal of medical Internet research》 |2019年第2期|共1页
作者
Shoko Wakamiya; Mizuki Morita; Yoshinobu Kano; Tomoko Ohkuma; Eiji Aramaki;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词
text miningsocial mediamachine learningnatural language processingartificial intelligencesurveillanceinfodemiologyinfoveillance;

机译：文本培养媒体学习语言处理术智能智能素脂肪缺失;

相似文献

外文文献
中文文献
专利

1. A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification [J] . Chung Ren-Hua, Kang Chen-Yu GigaScience . 2019,第5期

机译：用于复杂疾病研究的多组学数据模拟器及其在评估疾病分类的多组学数据分析方法中的应用
2. Reproducible evaluation of classification methods in Alzheimer's disease: Framework and application to MRI and PET data [J] . Samper-Gonzalez Jorge, Burgos Ninon, Bottani Simona, NeuroImage . 2018,第期

机译：阿尔茨海默病疾病分类方法的可重复评价：MRI和PET数据的框架和应用
3. Scientific Opinion on the evaluation of molecular typing methods for major food‐borne microbiological hazards and their use for attribution modelling, outbreak investigation and scanning surveillance: Part 2 (surveillance and data management activities) [J] . EFSA Journal . 2014,第7期

机译：关于评估主要食源性微生物危害的分子分型方法及其在归因模型，暴发调查和扫描监测中的应用的科学意见：第2部分（监视和数据管理活动）
4. SYNDROMIC SURVEILLANCE IN DAIRY CATTLE: DEVELOPMENT OF INDICATORS AND METHODS BASED ON REPRODUCTION DATA FOR EARLY DETECTION OF EMERGING DISEASES [C] . A. MARCEAU, A. MADOUASSE, A. LEHEBEL, Meeting of The Society^for^Veterinary^Epidemiology^and^Preventive^Medicine . 2013

机译：乳制牛综合征监测：基于再生数据的繁殖数据的指标与方法的发展
5. Surveillance of Venous Thromboembolism in Oklahoma County: Using Multiple Surveillance Methods and Data Sources to Improve Case Identification and Description of VTE Disease among Patients with Cancer History [D] . McCumber, Micah Denay. 2020

机译：俄克拉荷马州静脉血栓栓塞监测：使用多种监测方法和数据来源来改善癌症历史患者患者患者疾病的案例鉴定及描述
6. Cosmetics Europe compilation of historical serious eye damage/eye irritation in vivo data analysed by drivers of classification to support the selection of chemicals for development and evaluation of alternative methods/strategies: the Draize eye test Reference Database (DRD) [O] . João Barroso, Uwe Pfannenbecker, Els Adriaens, -1

机译：欧洲化妆品市场对历史上严重眼损伤/眼刺激的体内数据进行汇编并通过分类驱动器进行了分析以支持化学药品的选择以开发和评估其他方法/策略：Draize眼试验参考数据库（DRD）
7. A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification [O] . Ren-Hua Chung, Chen-Yu Kang 2019

机译：用于复杂疾病研究的多OMICS数据模拟器及其应用，评价疾病分类的多OMICS数据分析方法

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

摘要

著录项

相似文献

相关主题

期刊订阅