首页> 外文会议>International Conference of the German Society for Computational Linguistics and Language Technology >Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text
【24h】

Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text

机译:从口头尸检文本的死亡预测原因的语言和统计学衍生特征

获取原文

摘要

Automatic Text Classification (ATC) is an emerging technology with economic importance given the unprecedented growth of text data. This paper reports on work in progress to develop methods for predicting Cause of Death from Verbal Autopsy (VA) documents recommended for use in low-income countries by the World Health Organisation. VA documents contain both coded data and open narrative. The task is formulated as a Text Classification problem and explores various combinations of linguistic and statistical approaches to determine how these may improve on the standard bag-of-words approach using a dataset of over 6400 VA documents that were manually annotated with cause of death. We demonstrate that a significant improvement of prediction accuracy can be obtained through a novel combination of statistical and linguistic features derived from the VA text. The paper explores the methods by which ATC may leads to improved accuracy in Cause of Death prediction.
机译:自动文本分类(ATC)是一个具有经济重要性的新兴技术,因为文本数据的前所未有的增长。本文有关正在进行的工作,以制定从世界卫生组织建议在低收入国家使用的口头尸检(VA)文件预测死亡原因的方法。 VA文档包含编码数据和打开叙述。该任务被制定为文本分类问题,并探讨了语言和统计方法的各种组合,以确定这些方法如何使用6400多个VA文件的数据集来改善标准袋式方法,该数据集是用死亡原因手动注释的6400个VA文档的数据集。我们证明,通过衍生自VA文本的统计和语言特征的新组合,可以获得预测准确性的显着提高。本文探讨了ATC可能导致死亡预测原因提高准确性的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号