...
首页> 外文期刊>Journal of medical Internet research >Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
【24h】

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

机译:推文分类对基于Twitter的疾病监视:新数据,方法和评估

获取原文
           

摘要

BackgroundThe amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not.ObjectiveThis study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP.MethodsIn , 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss.ResultsThe best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively.ConclusionsThis paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
机译:背景技术网上的医疗和临床相关信息的数量正在增加。在可用的不同类型的信息中,直接从人们获得的基于社交媒体的数据是特别有价值的,并且正在引起重大关注。为了鼓励医疗自然语言处理(NLP)研究利用社交媒体数据,第13届NII测试台和社区用于信息访问研究(NTCIR-13)Web文档的医学自然语言处理(MEDWeb)提供了跨语言的伪Twitter消息和多标签语料库,涵盖3种语言(日语,英语和中文),并用8个症状标签(如寒冷,发烧和流感)注释。然后,参与者将每个推文分类为2个类别中的1个:那些包含患者症状的人和那些没有的人.Objectivethis的研究旨在展示参与日本子任务,英语子任务和中国子系统以及讨论的团体结果以及讨论澄清在医疗NLP.Methodsin领域中需要解决的问题,参加日本子任务的8组(19个系统),4组(12个系统)参与英语子摊,2组(6个系统)参加中国子任务。总共为每个子任务构建了2个基线系统。通过基于精度和召回的精确匹配精度评估参与者和基线系统的性能,以及汉明损失。最佳系统的最佳系统达到0.880匹配精度,0.920 F措施和0.019锤击。匹配精度,F测量和日本子摊中的汉明损失的平均值为0.720,820和0.051;英语子摊的人数为0.770,850和0.037;和中国子购表的那些分别为0.810,0.880和0.032。结论纸张,并讨论了参与NTCIR-13 MedWeb任务的系统的性能。由于MEDWeb任务设置可以正式化为文本的事实化,因此可以将此任务的实现直接应用于实际临床应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号