...
首页> 外文期刊>Bioinformatics >A quantitative model for linking two disparate sets of articles in MEDLINE
【24h】

A quantitative model for linking two disparate sets of articles in MEDLINE

机译:链接MEDLINE中两个不同文章集的定量模型

获取原文
获取原文并翻译 | 示例

摘要

Background: Identifying information that implicitly links two disparate sets of articles is a fundamental and intuitive data mining strategy that can help investigators address real scientific questions. The Arrowsmith two-node search finds title words and phrases (so-called B-terms) that are shared across two sets of articles within MEDLINE and displays them in a manner that facilitates human assessment. A serious stumbling-block has been the lack of a quantitative model for predicting which of the hundreds if not thousands of B-terms computed for a given search are most likely to be relevant to the investigator. Methodology/Principal Findings: Using a public two-node search interface, field testers devised a set of two-node searches under real life conditions and a certain number of B-terms were marked relevant. These were employed as 'gold standards;' each B-term was characterized according to eight complementary features that were strongly correlated with relevance. A logistic regression model was developed that permits one to estimate the probability of relevance for each B-term, to rank B-terms according to their likely relevance, and to estimate the overall number of relevant B-terms inherent in a given two-node search. Conclusions/Significance: The model greatly simplifies and streamlines the process of carrying out a two-node search, and may be applicable to a number of other literature-based discovery applications, including the so-called one-node search and related gene-centric strategies that incorporate implicit links to predict how genes may be related to each other and to human diseases. This should encourage much wider exploration of text mining for implicit information among the general scientific community.
机译:背景:识别隐式链接两组不同文章的信息是一种基本而直观的数据挖掘策略,可以帮助研究人员解决实际的科学问题。 Arrowsmith两节点搜索可找到在MEDLINE中的两组文章之间共享的标题词和短语(所谓的B术语),并以有助于人类评估的方式显示它们。一个严重的绊脚石是缺乏一个定量模型,该模型无法预测针对给定搜索计算的数百个(如果不是数千个)B项中哪一个与研究者最相关。方法/主要发现:现场测试人员使用公共的两节点搜索界面,在现实生活条件下设计了一组两节点搜索,并标记了一定数量的B项。这些被用作“黄金标准”;每个B项都根据与相关性密切相关的八个互补特征来表征。开发了一种逻辑回归模型,该模型允许估计每个B项的相关概率,根据它们可能的相关性对B项进行排名,并估计给定两个节点中固有的相关B项的总数搜索。结论/意义:该模型大大简化和简化了进行两节点搜索的过程,并且可能适用于许多其他基于文献的发现应用程序,包括所谓的单节点搜索和相关的以基因为中心的发现包含隐式链接以预测基因之间如何相互关联以及与人类疾病关联的策略。这应该鼓励普通科学界对隐式信息的文本挖掘进行更广泛的探索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号