Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg

首页> 外文期刊>PLoS Biology >Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

【24h】

Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

机译：Textpresso：基于本体的生物文献信息检索与提取系统

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. Textpresso can be accessed at http://www.textpresso.org or via WormBase at http://www.wormbase.org.

机译：我们已经开发了Textpresso，这是一种用于科学文献的新型文本挖掘系统，其功能远远超出了简单的关键字搜索引擎的功能。 Textpresso的两个主要元素是将科学文章全文分为单个句子的集合，以及术语类别的实现，可在其中搜索文章和单个句子的数据库。类别是生物学概念的类别（例如，基因，等位基因，细胞或细胞群，表型等）以及与两个对象（例如，关联，调控等）相关或描述一个对象（例如，生物学过程等）的类别。）。它们共同构成了称为本体的对象和概念类型的目录。在用术语填充该本体之后，标记整个文章和摘要的语料库以标识这些类别的术语。当前的本体包括33个术语类别。搜索引擎使用户能够搜索句子或文档中的这些标签和/或关键字中的一个或组合，并且由于本体允许查询词义，因此可以制定语义查询。全文访问将生物数据类型的召回率从45％提高到95％。本体可以显着加快特定生物学事实（例如基因与基因之间的相互作用）的提取，Textpresso可以自动执行几乎与专家策展人一样识别句子的功能;在搜索两个唯一命名的基因和一个交互项时，本体使搜索效率提高了3倍。 Textpresso目前专注于秀丽隐杆线虫文学，拥有3,800篇全文文章和16,000篇摘要。本体的词典包含14,500个条目，每个条目都包含特定单词或短语的所有版本，并且它包含基因本体数据库的所有类别。 Textpresso是有用的策展工具，也是研究人员的搜索引擎，并且可以轻松地扩展到其他特定于有机体的文本语料库。可以在http://www.textpresso.org上或通过WormBase在http://www.wormbase.org上访问Textpresso。

著录项

来源
《PLoS Biology》 |2004年第11期|共15页
作者
Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类分子生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Ontology-based Biological Information Retrieval System [J] . Marwa Mostafa Mostafa, Enas M.F. El Houby, Akram Salah Australian Journal of Basic and Applied Sciences . 2012,第2013期

机译：基于本体的生物信息检索系统
2. Ontology-based design information extraction and retrieval [J] . ZHANJUN LI, KARTHIK RAMANI Artificial Intelligence for Engineering Design, Analysis & Manufacturing . 2007,第2期

机译：基于本体的设计信息提取与检索
3. An Ontology-based Approach to Support Text Mining and Information Retrieval in the Biological Domain [J] . Khaled Khelif, Rose Dieng-Kuntz, Pascal Barbry Journal of Universal Computer Science . 2007,第12期

机译：基于本体的方法来支持生物域中的文本挖掘和信息检索
4. Domain Ontology-based Construction of Agriculture Literature Retrieval System [C] . Ling Cao, Lin He The 4th International Conference on Wireless Communications, Networking and Mobile Computing（第四届IEEE无线通信、网络技术及移动计算国际会议）论文集 . 2008

机译：基于领域本体的农业文献检索系统的构建
5. Ontology-based information retrieval system framework to support oncology drug development planning and regulatory research. [D] . Vete, Meeta. 2013

机译：基于本体的信息检索系统框架，可支持肿瘤药物开发计划和法规研究。
6. Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature [O] . Hans-Michael Müller, Eimear E Kenny, Paul W Sternberg 2004

机译：Textpresso：基于本体的生物文献信息检索与提取系统
7. Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature [O] . Müller Hans-Michael, Kenny Eimear E., Sternberg Paul W. 2004

机译：Textpresso：基于本体的生物文献信息检索与提取系统

Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

摘要

著录项

相似文献

相关主题

期刊订阅