首页> 美国卫生研究院文献>The Plant Cell >The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis
【2h】

The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis

机译:文本挖掘在植物研究数据集成和网络生物学中的潜力:以拟南芥为例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
机译:尽管有各种用于植物研究的数据库,但目前生物分子文献中仍隐藏着大量信息。文本挖掘提供了通过自动处理文本来检索这些数据的必要手段。但是,直到最近,才以具有足够的计算能力来实施大规模文本处理的高级文本挖掘方法。在这项研究中,我们使用适用于所有PubMed摘要和PubMed Central全文的最新文本挖掘系统,评估了一般文本挖掘对于植物生物学研究尤其是网络生物学的潜力。我们目前对拟南芥的文本数据进行广泛的评估,评估这种新资源在植物网络分析中使用的整体准确性。此外,我们将文本挖掘信息与蛋白质和蛋白质以及实验数据库中的调控相互作用结合在一起。从产生的网络中描绘出紧密连接的基因簇,说明了这种整合方法对于掌握拟南芥现有知识并通过关联有罪感揭示基因信息至关重要。所有大规模数据集以及手动编辑的文本数据都可以公开获得,从而促进了文本挖掘数据在未来植物生物学研究中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号