Enhancing navigation in biomedical databases by community voting and database-driven text classification

Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder

首页> 外文期刊>BMC Bioinformatics >Enhancing navigation in biomedical databases by community voting and database-driven text classification

【24h】

Enhancing navigation in biomedical databases by community voting and database-driven text classification

机译：通过社区投票和数据库驱动的文本分类，增强生物医学数据库中的导航

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify s of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at http://pepbank.mgh.harvard.edu .

机译：背景技术生物数据库的广度及其信息内容继续呈指数增长。不幸的是，我们查询此类资源的能力通常仍然不是最佳的。在这里，我们介绍并应用社区投票，数据库驱动的文本分类和视觉辅助工具，以整合分布式专家知识，自动对数据库条目进行分类并对其进行有效检索。结果以先前开发的肽数据库为例，我们比较了几种机器学习算法在将已发表文献结果分类为与肽研究相关的类别（例如与癌症，血管生成，分子影像学相关或不相关）的能力方面进行了比较。袋装决策树的集合最能满足我们应用程序的要求。在比较测试中，没有其他算法能够始终如一地表现更好。此外，我们表明该算法产生了有意义的类概率估计，可用于可视化检索过程中自动分类的置信度。为了允许查看由自动分类丰富的长长的搜索结果列表，我们向Web界面添加了动态热图。我们通过使用户能够以Web 2.0风格投票来纠正自动分类错误，从而触发所有条目的重新分类，从而充分利用社区知识。我们使用了一个新颖的框架，在该框架中，数据库“驱动”整个投票汇总和重新分类过程，以提高速度，同时节省计算资源并保持方法的可扩展性。在我们的实验中，我们通过向几乎完美标记的实例添加各种级别的噪声来模拟社区投票，并表明在这种条件下，分类可以得到显着改善。结论使用PepBank作为模型数据库，我们展示了如何建立一个分类辅助检索系统，该系统从社区收集训练数据，完全由数据库控制，可以很好地扩展并发变更事件，并且可以适应于添加文本分类功能到其他生物医学数据库。可以从http://pepbank.mgh.harvard.edu访问该系统。

著录项

来源
《BMC Bioinformatics》 |2009年第1期|共页
作者
Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A database-driven neural computing framework for classification of vertical jump patterns of healthy female netballers using 3D kinematics–EMG features [J] . Neural computing & applications . 2020,第5期

机译：一种数据库驱动的神经计算框架，用于使用3D运动学 - EMG特征对健康女性万篮球运动员垂直跳转模式的分类
2. Aircraft Damage Identification and Classification for Database-Driven Online Flight-Envelope Prediction [J] . Y. Zhang, C. C. de Visser, Q. P. Chu Journal of guidance, control, and dynamics . 2018,第2期

机译：用于数据库驱动的在线飞行信封预测的飞机损坏识别和分类
3. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information. [J] . Workman TE, Fiszman M, Hurdle JF, Journal of the Medical Library Association : . 2010,第4期

机译：生物医学文本摘要以支持遗传数据库管理：使用语义MEDLINE创建遗传信息的辅助数据库。
4. Using weighted majority voting classifier combination for relation classification in biomedical texts [C] . Remya K R, Ramya J S International conference on control, instrumentation, communication and computational technologies . 2014

机译：使用加权多数投票分类器组合进行生物医学文本中的关系分类
5. Automated biomedical text fragmentation in support of biomedical sentence fragment classification [D] . Salehi, Sara. 2009

机译：自动生物医学文本片段化，支持生物医学句子片段分类
6. Enhancing navigation in biomedical databases by community voting and database-driven text classification [O] . Timo Duchrow, Timur Shtatland, Daniel Guettler, 2009

机译：通过社区投票和数据库驱动的文本分类增强生物医学数据库中的导航
7. Enhancing navigation in biomedical databases by community voting and database-driven text classification [O] . 2009

机译：通过社区投票和数据库驱动的文本分类，增强生物医学数据库中的导航

Enhancing navigation in biomedical databases by community voting and database-driven text classification

摘要

著录项

相似文献

相关主题

期刊订阅