首页> 外文学位 >Knowledge discovery and hypothesis generation from biomedical literature using text mining.
【24h】

Knowledge discovery and hypothesis generation from biomedical literature using text mining.

机译:使用文本挖掘从生物医学文献中进行知识发现和假设生成。

获取原文
获取原文并翻译 | 示例

摘要

Automated extraction of knowledge from voluminous documents is a vast research area. Text mining is a promising approach for extracting knowledge from unstructured textual documents and is the automated approach for knowledge extraction from unstructured data like text. The objective of this thesis is to mine documents pertaining to Ayurveda, which are retrieved from PubMed, and find novel transitive associations among biological objects. This thesis discusses the extraction of biological objects from the collected documents (databank) using an Automated Vocabulary Discovery (AVD) algorithm. An effective co-occurrence based text mining algorithm was designed for hypothesis generation combining AVD (Automated Vocabulary Discovery) algorithm and tf-idf (term frequency and inverse document frequency) algorithm. This algorithm was designed to extract novel binary associations and hypergraph based ternary associations (object1 -- object2 -- object3) among various objects (genes, chemicals, drugs etc.,) using transitive text mining. This research established relationship between objects from modern medicine and traditional Indian medicine Ayurveda. Thus generated hypotheses (novel associations) were assigned with co-occurrence based significance score and few highly significant novel associations were validated. Finally compared and analyzed thus obtained knowledge (ternary associations) with binary associations (object1 -- object2) which form the superset for the ternary associations.
机译:从大量文档中自动提取知识是一个广阔的研究领域。文本挖掘是一种从非结构化文本文档中提取知识的有前途的方法,并且是从非结构化数据(如文本)中提取知识的自动化方法。本文的目的是挖掘与阿育吠陀有关的,从PubMed中检索到的文献,并找到生物对象之间的新型传递关联。本文讨论了使用自动词汇发现(AVD)算法从收集的文档(数据库)中提取生物对象的方法。结合AVD(自动词汇发现)算法和tf-idf(词频和逆文档频率)算法,设计了一种有效的基于共现的文本挖掘算法来生成假设。该算法旨在使用传递文本挖掘来提取各种对象(基因,化学物质,药物等)之间新颖的二进制关联和基于超图的三元关联(对象1-对象2-对象3)。这项研究建立了现代医学与印度传统医学阿育吠陀之间的关系。这样产生的假设(新的关联)被分配了基于共现的显着性得分,并且很少有高度有效的新颖关联被验证。最后对由此获得的知识(三元关联)与二元关联(object1-object2)进行比较和分析,这些二进制关联构成三元关联的超集。

著录项

  • 作者

    Vaka, Harsha Gopal Goud.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2009
  • 页码 53 p.
  • 总页数 53
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号