首页> 外文期刊>Methods of information in medicine >Developing an NLP and IR-based algorithm for analyzing gene-disease relationships.
【24h】

Developing an NLP and IR-based algorithm for analyzing gene-disease relationships.

机译:开发用于分析基因疾病关系的基于NLP和IR的算法。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

OBJECTIVES: High-throughput techniques such as cDNA microarray, oligonucleotide arrays, and serial analysis of gene expression (SAGE) have been developed and used to automatically screen huge amounts of gene expression data. However, researchers usually spend lots of time and money on discovering gene-disease relationships by utilizing these techniques. We prototypically implemented an algorithm that can provide some kind of predicted results for biological researchers before they proceed with experiments, and it is very helpful for them to discover gene-disease relationships more efficiently. METHODS: Due to the fast development of computer technology, many information retrieval techniques have been applied to analyze huge digital biomedical databases available worldwide. Therefore we highly expect that we can apply information retrieval (IR) technique to extract useful information for the relationship of specific diseases and genes from MEDLINE articles. Furthermore, we also applied natural language processing (NLP) methods to do the semantic analysis for the relevant articles to discover the relationships between genes and diseases. RESULTS: We have extracted gene symbols from our literature collection according to disease MeSH classifications. We have also built an IR-based retrieval system, "Biomedical Literature Retrieval System (BLRS)" and applied the N-gram model to extract the relationship features which can reveal the relationship between genes and diseases. Finally, a relationship network of a specific disease has been built to represent the gene-disease relationships. CONCLUSIONS: A relationship feature is a functional word that can reveal the relationship between one single gene and a disease. By incorporating many modern IR techniques, we found that BLRS is a very powerful information discovery tool for literature searching. A relationship network which contains the information on gene symbol, relationship feature, and disease MeSH term can provide an integrated view to discover gene-disease relationships.
机译:目的:已开发出高通量技术,例如cDNA微阵列,寡核苷酸阵列和基因表达序列分析(SAGE),并用于自动筛选大量基因表达数据。但是,研究人员通常会花费大量时间和金钱来利用这些技术来发现基因-疾病的关系。我们以原型方式实现了一种算法,该算法可以在生物学研究人员进行实验之前为他们提供某种预测结果,这对他们更有效地发现基因-疾病关系非常有帮助。方法:由于计算机技术的飞速发展,许多信息检索技术已被用于分析世界范围内可用的巨大数字生物医学数据库。因此,我们高度希望我们可以应用信息检索(IR)技术从MEDLINE文章中提取有用的信息,以了解特定疾病与基因之间的关系。此外,我们还应用自然语言处理(NLP)方法对相关文章进行语义分析,以发现基因与疾病之间的关系。结果:根据疾病MeSH分类,我们从文献集中提取了基因符号。我们还建立了一个基于IR的检索系统“生物医学文献检索系统(BLRS)”,并应用了N-gram模型来提取可以揭示基因与疾病之间关系的关系特征。最后,已经建立了特定疾病的关系网络来表示基因-疾病关系。结论:关系特征是一个功能词,可以揭示一个单一基因与疾病之间的关系。通过结合许多现代的IR技术,我们发现BLRS是用于文献搜索的非常强大的信息发现工具。包含有关基因符号,关系特征和疾病MeSH术语的信息的关系网络可以提供一个综合视图,以发现基因-疾病关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号