首页> 外文学位 >Text Summarization and Categorization for Scientific and Health-related Data
【24h】

Text Summarization and Categorization for Scientific and Health-related Data

机译:科学和健康相关数据的文本摘要和分类

获取原文
获取原文并翻译 | 示例

摘要

The increasing amount of unstructured health-related data has created a need for intelligent processing, summarizing, and categorizing these data to extract knowledge from them. My research goal in this dissertation is to develop Natural Language Processing (NLP) and Information Retrieval (IR) methods for better processing and understanding health-related textual information to promote health care and well-being of individuals.;First, I focus on scientific literature as an important source of knowledge distribution in health care. It has become a challenge for researchers to keep up with the increasing rate at which scientific findings are published. To address this problem, I propose summarization methods using citation texts and discourse structure of the papers to provide a concise representation of important contributions of the papers. I also investigate methods to address the problem of citation inaccuracy by linking the citations to their related parts in the target paper, capturing their relevant context. In addition, I raise the problem of the inadequacy of current evaluation metrics for scientific document summarization and present a superior method based on semantic relevance in evaluating the summaries.;In the second part, I focus on other significant sources of health-related information including clinical notes and social media. I investigate categorization methods to address the critical problem of medical errors which are among leading causes of death worldwide. I demonstrate how we can effectively identify significant errors and harmful cases through medical narratives that could help prevent similar future problems. Mental health is another significant dimension of health and well-being that is sometimes overlooked. Suicide, the most serious challenge in mental health, accounts for approximately 1.4% of all deaths and approximately one person dies by suicide every 40 seconds. I investigate social media as a platform through which mental problems such as depression and self-harm can be investigated. I present both feature-rich and neural network methods for assessing the risk of depression, self-harm, and suicide to the individuals based on their general language expressed in social media.
机译:与健康相关的非结构化数据的数量不断增长,因此需要智能处理,汇总和分类这些数据以从中提取知识。我的研究目标是开发自然语言处理(NLP)和信息检索(IR)方法,以更好地处理和理解与健康有关的文本信息,从而促进个人的医疗保健和福祉。文献是卫生保健知识传播的重要来源。要跟上科学发现的增长速度,已经成为研究人员的挑战。为了解决这个问题,我提出了使用引文和论文的篇章结构的总结方法,以简明扼要地表达论文的重要贡献。我还研究了通过将引用与目标文件中的相关部分链接,捕获其相关上下文来解决引用不准确问题的方法。此外,我提出了目前的评估指标不足以用于科学文献摘要的问题,并提出了一种基于语义相关性的高级方法来评估摘要。第二部分,我将重点介绍与健康相关的其他重要信息来源,包括临床笔记和社交媒体。我研究了分类方法,以解决医疗错误的严重问题,这些问题是全球主要死亡原因之一。我演示了我们如何通过医疗叙述有效地识别重大错误和有害案例,以帮助预防类似的未来问题。精神健康是健康和福祉的另一个重要方面,有时会被忽视。自杀是心理健康方面最严重的挑战,大约占所有死亡人数的1.4%,大约每40秒就有一名人死于自杀。我研究社交媒体作为一个平台,通过该平台可以调查诸如抑郁和自残等心理问题。我介绍了多种功能丰富的方法和神经网络方法,用于根据社交媒体上表达的一般语言来评估个人遭受抑郁,自残和自杀的风险。

著录项

  • 作者

    Cohan, Arman.;

  • 作者单位

    Georgetown University.;

  • 授予单位 Georgetown University.;
  • 学科 Computer science.;Information science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 248 p.
  • 总页数 248
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号