首页> 外文会议>Data Mining Workshops, 2009. ICDMW '09 >A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically
【24h】

A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically

机译:一种语义描述共表达谓词的论文的数据挖掘方法

获取原文

摘要

Information management and extraction in the field of biomedical research has become a requirement with the rapid increase in the amount of data being published in this area. In this paper, a graphical model, Conditional Random Fields has been used to extract a particular gene-gene relationship called ȁC;coexpressionȁD; from the existing literature. First, a Conditional Random Fields based model has been trained and tested on full-length papers downloaded from PubMed, to label the predicates that talk about coexpression of genes. Proper local and contextual text features at both word and sentence levels are proposed and extracted during the pre-processing step. The classification performance of the model trained based on the proposed features has been compared with the that of Support Vector Machines, Nearest Neighbor with generalization, and Neural Networks algorithms, and seen to outperform them all. In our second experiment, the proposed ranking scheme, which is based on classification results, is applied to the ranked lists of papers returned by PubMed and Google, respectively. The comparison of our ranked results to that of PubMed and Google demonstrates that our proposed ranking scheme performs better than both in distinguishing a positive paper from a negative paper. In conclusion, this paper describes a specialized classification and ranking framework that can retrieve papers that really talk about coexpression between and among genes based on mining of semantics and not just lexical search.
机译:随着该领域发表的数据量的迅速增加,生物医学研究领域的信息管理和提取已成为一种需求。在本文中,已使用一个图形模型“条件随机场”来提取一种特殊的基因与基因的关系,称为ȁC;共表达exD;根据现有文献。首先,基于条件随机场的模型已经在从PubMed下载的全长论文中进行了训练和测试,以标记谈论基因共表达的谓词。在预处理步骤中,提出并提取了单词和句子级别的适当本地和上下文文本特征。将基于所提出的特征训练的模型的分类性能与支持向量机,具有泛化性的最近邻算法和神经网络算法的分类性能进行了比较,并在所有方面均胜过了它们。在我们的第二个实验中,基于分类结果的拟议排名方案分别应用于PubMed和Google返回的论文的排名列表。将我们的排名结果与PubMed和Google的排名结果进行比较,结果表明,我们提出的排名方案在区分正面论文和负面论文方面均优于两者。总而言之,本文描述了一种专门的分类和排名框架,该框架可以基于语义挖掘而不仅仅是词法搜索来检索真正谈论基因之间和基因之间共表达的论文。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号