首页> 外文学位 >Information extraction to enable faceted search over large text document collections.

【24h】

Information extraction to enable faceted search over large text document collections.

机译：信息提取可对大型文本文档集进行多面搜索。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent advances in computational and biological methods have remarkably changed the scale of biomedical research, and with it the unprecedented growth in both the production of biomedical data and amount of published literature discussing it in last two decades. Complete genomes can now be sequenced within months and even weeks; computational methods can expedite the identification of tens of thousands of genes and large-scale experimental methods. The data generated by these experiments is highly inter-connected; the results from sequence analysis and micro-arrays depend on functional information and signal transduction pathways cited in peer-reviewed publications for evidence.;Imagine a biologist researching the cure for a disease, such as leukemia, she currently has to read all the research published that deal with this disease, and find all the proteins, genes and other information, like drugs and chemicals, that will help her better understand the molecular connections (pathways) between these substances and the disease. Even though many systems aid in accessing and browsing through this myriad collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts.;This dissertation discusses practical information extraction systems that can also populate faceted search and navigation systems to enable discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines. This dissertation presents an automated system to extract bio-molecular events from bio-medical text. The system first semantically classifies each sentence to the class type of the event mentioned in the sentence, and then using class-specific rules, it extracts the participants of that event. An integrative framework to fuse faceted search with information extraction is also proposed to provide a search service that addresses user's desideratum of "complete-ness" of query results, not just the top-ranked ones. To demonstrate the utility of this framework, the dissertation also details a prototype enterprise quality search and discovery service that helps life sciences researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm that is powered by information extraction.

机译：计算和生物学方法的最新进展显着改变了生物医学研究的规模，并且在过去的二十年中，生物医学数据的产生和讨论它的已发表文献的数量都空前增长。现在可以在几个月甚至几周内完成完整的基因组测序。计算方法可以加快数万个基因的鉴定和大规模的实验方法。这些实验生成的数据是高度互连的；序列分析和微阵列的结果取决于同行评审出版物中引用的功能信息和信号转导途径作为证据。想象一下，一位生物学家正在研究如何治愈白血病等疾病，她目前必须阅读所有已发表的研究成果。可以治疗这种疾病，并找到所有蛋白质，基因和其他信息（例如药物和化学药品），这将有助于她更好地了解这些物质与疾病之间的分子联系（途径）。即使许多系统都可以帮助访问和浏览大量文档，但这种信息过载的广度和深度却是压倒性的。一个自动化的提取系统，结合对这些文档集的认知搜索和导航服务，不仅可以节省时间和精力，而且还为发现隐式传达到文本中的迄今未知信息铺平了道路。它还填充了多面搜索和导航系统，以便能够发现实体之间重要的语义关系，例如基因，疾病，药物和细胞系。本文提出了一种从生物医学文献中提取生物分子事件的自动化系统。系统首先在语义上将每个句子分类为句子中提到的事件的类类型，然后使用特定于类的规则，提取该事件的参与者。还提出了将多面搜索与信息提取相融合的集成框架，以提供一种搜索服务，以解决用户对查询结果的“完整性”的需求，而不仅仅是排名靠前的查询。为了演示此框架的实用性，本文还详细介绍了企业质量搜索和发现原型服务，该服务可以通过建议丰富中间结果的概念来帮助生命科学研究人员进行逐步指导的细化，从而促进“发现”的过程。您可以在搜索过程中找到更多信息”。

著录项

作者
Ahmed, Syed Toufeeq.;
展开▼
作者单位

Arizona State University.;

展开▼
授予单位 Arizona State University.;
学科 Computer Science.
学位 Ph.D.
年度 2010
页码 179 p.
总页数 179
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. "Method and System for Converting Image Text Documents in Bit-Mapped Formats to Searchable Text and for Searching the Searchable Text" in Patent Application Approval Process [J] . Robotics and Machine Learning . 2013,第1期

机译：专利申请批准过程中的“将位图格式的图像文本文档转换为可搜索文本并搜索可搜索文本的方法和系统”
2. Patent Issued for Method and System for Converting Image Text Documents in Bit-Mapped Formats to Searchable Text and for Searching the Searchable Text [J] . Robotics and Machine Learning . 2012,第44期

机译：将位图格式的图像文本文档转换为可搜索文本并用于搜索可搜索文本的方法和系统已颁发专利
3. The text, the full text and nothing but the text: Part 1 - Standards for creating textual information in patent documents and general search implications [J] . Stephen Adams World Patent Information . 2010,第1期

机译：文本，全文和仅是文本：第1部分-在专利文档中创建文本信息的标准和一般的搜索含义
4. Extraction of Open-Domain Class Attributes from Text: Building Blocks for Faceted Search [C] . Marius Pasca 33rd annual international ACM SIGIR conference on research and development in information retrieval 2010 . 2010

机译：从文本中提取开放域类属性：多面搜索的构建块
5. Robust knowledge extraction over large text collections. [D] . Song, Min. 2005

机译：对大型文本集进行可靠的知识提取。
6. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora [O] . Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, -1

机译：FacetGist：大型技术语料库中文档构面的集体提取
7. Extraction and Search of Chemical Formulae in Text Documents on the Web [O] . Bingjun Sun, Qingzhao Tan, Prasenjit Mitra, 2007

机译：Web上文本文档中化学式的提取与搜索

Information extraction to enable faceted search over large text document collections.

摘要

著录项

相似文献

相关主题

期刊订阅