首页> 外文会议>Data Mining Workshops, 2009. ICDMW '09 >A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically

【24h】

A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically

机译：一种语义描述共表达谓词的论文的数据挖掘方法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Information management and extraction in the field of biomedical research has become a requirement with the rapid increase in the amount of data being published in this area. In this paper, a graphical model, Conditional Random Fields has been used to extract a particular gene-gene relationship called ȁC;coexpressionȁD; from the existing literature. First, a Conditional Random Fields based model has been trained and tested on full-length papers downloaded from PubMed, to label the predicates that talk about coexpression of genes. Proper local and contextual text features at both word and sentence levels are proposed and extracted during the pre-processing step. The classification performance of the model trained based on the proposed features has been compared with the that of Support Vector Machines, Nearest Neighbor with generalization, and Neural Networks algorithms, and seen to outperform them all. In our second experiment, the proposed ranking scheme, which is based on classification results, is applied to the ranked lists of papers returned by PubMed and Google, respectively. The comparison of our ranked results to that of PubMed and Google demonstrates that our proposed ranking scheme performs better than both in distinguishing a positive paper from a negative paper. In conclusion, this paper describes a specialized classification and ranking framework that can retrieve papers that really talk about coexpression between and among genes based on mining of semantics and not just lexical search.

机译：随着该领域发表的数据量的迅速增加，生物医学研究领域的信息管理和提取已成为一种需求。在本文中，已使用一个图形模型“条件随机场”来提取一种特殊的基因与基因的关系，称为ȁC;共表达exD;根据现有文献。首先，基于条件随机场的模型已经在从PubMed下载的全长论文中进行了训练和测试，以标记谈论基因共表达的谓词。在预处理步骤中，提出并提取了单词和句子级别的适当本地和上下文文本特征。将基于所提出的特征训练的模型的分类性能与支持向量机，具有泛化性的最近邻算法和神经网络算法的分类性能进行了比较，并在所有方面均胜过了它们。在我们的第二个实验中，基于分类结果的拟议排名方案分别应用于PubMed和Google返回的论文的排名列表。将我们的排名结果与PubMed和Google的排名结果进行比较，结果表明，我们提出的排名方案在区分正面论文和负面论文方面均优于两者。总而言之，本文描述了一种专门的分类和排名框架，该框架可以基于语义挖掘而不仅仅是词法搜索来检索真正谈论基因之间和基因之间共表达的论文。

著录项

来源
《Data Mining Workshops, 2009. ICDMW '09》|2009年|483-488|共6页
会议地点 Miami FL(US);Miami FL(US)
作者
Zhang Chengcui; Tiwari Richa; Chen Wei-Bang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Predication of Parkinson's disease using data mining methods: a comparative analysis of tree, statistical, and support vector machine classifiers. [J] . Geeta Yadav, Yugal Kumar, Gadadhar Sahoo Indian journal of medical sciences. . 2011,第6期

机译：使用数据挖掘方法预测帕金森氏病：树木，统计和支持向量机分类器的比较分析。
2. Mining for coexpression across hundreds of datasets using novelrank aggregation and visualization methods [J] . Priit Adler, Raivo Kolde, Meelis Kull, Genome Biology . 2009,第12期

机译：使用新颖的秩聚合和可视化方法挖掘数百个数据集的共表达
3. An Iterative Data Mining Approach for Mining Overlapping Coexpression Patterns in Noisy Gene Expression Data [J] . Ma P. C. H., Chan K. C. C. NanoBioscience, IEEE Transactions on . 2009,第3期

机译：一种用于挖掘嘈杂的基因表达数据中重叠共表达模式的迭代数据挖掘方法
4. A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically [C] . Chengcui Zhang, Richa Tiwari, Wei-Bang Chen IEEE International Conference on Data Mining . 2009

机译：提取和排名论文的数据挖掘方法语义上描述共表达谓词
5. Methods for integrating and comparing coexpression information over multiple data sets and applications in mice aging. [D] . Southworth, Lucinda Kay. 2009

机译：整合和比较多个数据集上的共表达信息的方法及其在小鼠衰老中的应用。
6. Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods [O] . Priit Adler, Raivo Kolde, Meelis Kull, 2009

机译：使用新颖的等级汇总和可视化方法挖掘数百个数据集的共表达
7. Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods [O] . 2009

机译：使用新颖的等级汇总和可视化方法挖掘数百个数据集的共表达
8. Using the Random Nearest Neighbor Data Mining Method to Extract Maximum Information Content from Weather Forecasts from Multiple Predictors of Weather and One Predictand (Low-Level Turbulence). [R] . Keller, D. L. 2014

机译：使用随机最近邻数据挖掘方法从天气和一个预测的多个预测因子（低水平湍流）的天气预报中提取最大信息内容。

A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅