首页> 外文学位 >Information extraction to enable faceted search over large text document collections.
【24h】

Information extraction to enable faceted search over large text document collections.

机译:信息提取可对大型文本文档集进行多面搜索。

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in computational and biological methods have remarkably changed the scale of biomedical research, and with it the unprecedented growth in both the production of biomedical data and amount of published literature discussing it in last two decades. Complete genomes can now be sequenced within months and even weeks; computational methods can expedite the identification of tens of thousands of genes and large-scale experimental methods. The data generated by these experiments is highly inter-connected; the results from sequence analysis and micro-arrays depend on functional information and signal transduction pathways cited in peer-reviewed publications for evidence.;Imagine a biologist researching the cure for a disease, such as leukemia, she currently has to read all the research published that deal with this disease, and find all the proteins, genes and other information, like drugs and chemicals, that will help her better understand the molecular connections (pathways) between these substances and the disease. Even though many systems aid in accessing and browsing through this myriad collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts.;This dissertation discusses practical information extraction systems that can also populate faceted search and navigation systems to enable discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines. This dissertation presents an automated system to extract bio-molecular events from bio-medical text. The system first semantically classifies each sentence to the class type of the event mentioned in the sentence, and then using class-specific rules, it extracts the participants of that event. An integrative framework to fuse faceted search with information extraction is also proposed to provide a search service that addresses user's desideratum of "complete-ness" of query results, not just the top-ranked ones. To demonstrate the utility of this framework, the dissertation also details a prototype enterprise quality search and discovery service that helps life sciences researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm that is powered by information extraction.
机译:计算和生物学方法的最新进展显着改变了生物医学研究的规模,并且在过去的二十年中,生物医学数据的产生和讨论它的已发表文献的数量都空前增长。现在可以在几个月甚至几周内完成完整的基因组测序。计算方法可以加快数万个基因的鉴定和大规模的实验方法。这些实验生成的数据是高度互连的;序列分析和微阵列的结果取决于同行评审出版物中引用的功能信息和信号转导途径作为证据。想象一下,一位生物学家正在研究如何治愈白血病等疾病,她目前必须阅读所有已发表的研究成果。可以治疗这种疾病,并找到所有蛋白质,基因和其他信息(例如药物和化学药品),这将有助于她更好地了解这些物质与疾病之间的分子联系(途径)。即使许多系统都可以帮助访问和浏览大量文档,但这种信息过载的广度和深度却是压倒性的。一个自动化的提取系统,结合对这些文档集的认知搜索和导航服务,不仅可以节省时间和精力,而且还为发现隐式传达到文本中的迄今未知信息铺平了道路。它还填充了多面搜索和导航系统,以便能够发现实体之间重要的语义关系,例如基因,疾病,药物和细胞系。本文提出了一种从生物医学文献中提取生物分子事件的自动化系统。系统首先在语义上将每个句子分类为句子中提到的事件的类类型,然后使用特定于类的规则,提取该事件​​的参与者。还提出了将多面搜索与信息提取相融合的集成框架,以提供一种搜索服务,以解决用户对查询结果的“完整性”的需求,而不仅仅是排名靠前的查询。为了演示此框架的实用性,本文还详细介绍了企业质量搜索和发现原型服务,该服务可以通过建议丰富中间结果的概念来帮助生命科学研究人员进行逐步指导的细化,从而促进“发现”的过程。您可以在搜索过程中找到更多信息”。

著录项

  • 作者

    Ahmed, Syed Toufeeq.;

  • 作者单位

    Arizona State University.;

  • 授予单位 Arizona State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 179 p.
  • 总页数 179
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号