...
首页> 外文期刊>BMC Bioinformatics >Enhancing navigation in biomedical databases by community voting and database-driven text classification
【24h】

Enhancing navigation in biomedical databases by community voting and database-driven text classification

机译:通过社区投票和数据库驱动的文本分类,增强生物医学数据库中的导航

获取原文
           

摘要

Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify s of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at http://pepbank.mgh.harvard.edu .
机译:背景技术生物数据库的广度及其信息内容继续呈指数增长。不幸的是,我们查询此类资源的能力通常仍然不是最佳的。在这里,我们介绍并应用社区投票,数据库驱动的文本分类和视觉辅助工具,以整合分布式专家知识,自动对数据库条目进行分类并对其进行有效检索。结果以先前开发的肽数据库为例,我们比较了几种机器学习算法在将已发表文献结果分类为与肽研究相关的类别(例如与癌症,血管生成,分子影像学相关或不相关)的能力方面进行了比较。袋装决策树的集合最能满足我们应用程序的要求。在比较测试中,没有其他算法能够始终如一地表现更好。此外,我们表明该算法产生了有意义的类概率估计,可用于可视化检索过程中自动分类的置信度。为了允许查看由自动分类丰富的长长的搜索结果列表,我们向Web界面添加了动态热图。我们通过使用户能够以Web 2.0风格投票来纠正自动分类错误,从而触发所有条目的重新分类,从而充分利用社区知识。我们使用了一个新颖的框架,在该框架中,数据库“驱动”整个投票汇总和重新分类过程,以提高速度,同时节省计算资源并保持方法的可扩展性。在我们的实验中,我们通过向几乎完美标记的实例添加各种级别的噪声来模拟社区投票,并表明在这种条件下,分类可以得到显着改善。结论使用PepBank作为模型数据库,我们展示了如何建立一个分类辅助检索系统,该系统从社区收集训练数据,完全由数据库控制,可以很好地扩展并发变更事件,并且可以适应于添加文本分类功能到其他生物医学数据库。可以从http://pepbank.mgh.harvard.edu访问该系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号