首页> 外文期刊>Environmental health perspectives. >Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach
【24h】

Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach

机译:使用全面的文本挖掘和数据库融合方法生成血液曝光数据库

获取原文
           

摘要

Background: Blood chemicals are routinely measured in clinical or preclinical research studies to diagnose diseases, assess risks in epidemiological research, or use metabolomic phenotyping in response to treatments. A vast volume of blood-related literature is available via the PubMed database for data mining. Objectives: We aimed to generate a comprehensive blood exposome database of endogenous and exogenous chemicals associated with the mammalian circulating system through text mining and database fusion. Methods: Using NCBI resources, we retrieved PubMed abstracts, PubChem chemical synonyms, and PMC supplementary tables. We then employed text mining and PubChem crowdsourcing to associate phrases relating to blood with PubChem chemicals. False positives were removed by a phrase pattern and a compound exclusion list. Results: A query to identify blood-related publications in the PubMed database yielded 1.1 million papers. Matching a total of 15 million synonyms from 6.5 million relevant PubChem chemicals against all blood-related publications yielded 37,514 chemicals and 851,999 publications records. Mapping PubChem compound identifiers to the PubMed database yielded 49,940 unique chemicals linked to 676,643 papers. Analysis of open-access metabolomics papers related to blood phrases in the PMC database yielded 4,039 unique compounds and 204 papers. Consolidating these three approaches summed up to a total of 41,474 achiral structures that were linked to 65,957 PubChem CIDs and to over 878,966 PubMed articles. We mapped these compounds to 50 databases such as those covering metabolites and pathways, governmental and toxicological databases, pharmacology resources, and bioassay repositories. In comparison, HMDB, the Human Metabolome Database, links 1,075 compounds to blood-related primary publications. Conclusion: This new Blood Exposome Database can be used for prioritizing chemicals for systematic reviews, developing target assays in exposome research, identifying compounds in untargeted mass spectrometry, and biological interpretation in metabolomics data. The database is available at http://bloodexposome.org .
机译:背景:血液化学物质在临床或临床前研究研究中常规测量以诊断疾病,评估流行病学研究中的风险,或使用代谢物表型以应对治疗。通过PubMed数据库可以获得大量的血液相关文献进行数据挖掘。目的:我们旨在通过文本挖掘和数据库融合产生与哺乳动物循环系统相关的内源性和外源化学品的综合血液曝光数据库。方法:使用NCBI资源,我们检索了PubMed摘要,Pubchem Chemical同义词和PMC补充表。然后,我们就业,挖掘和Pubchem众包将与Pubchem Chemicals有关的短语。通过短语模式和复合排除列表删除假阳性。结果:查询识别PubMed数据库中与血液相关的出版物产生110万篇论文。共匹配共有1500万个与所有血腥出版物的650万相关的Pubchem化学物质的同义词产生了37,514种化学品和851,999个出版物记录。将Pubchem复合标识符映射到PubMed数据库中产生了49,940个独特的化学品,与676,643篇论文相关联。与PMC数据库中的血液短语相关的开放接入代谢组科的分析产生了4,039个独特的化合物和204篇论文。巩固这三种方法总结了总共41,474种成立结构,与65,957个Pubchem CID和超过878,966篇文章相关联。我们将这些化合物映射到50个数据库,例如涵盖代谢物和途径,政府和毒理学数据库,药理学资源和生物测定库的数据库。相比之下,HMDB,人代谢数据库,将1,075种化合物与血血相关的主要出版物。结论:这种新的血液曝光数据库可用于优先化化学品进行系统评价,在曝光研究中发育目标测定,鉴定未标准的质谱中的化合物,以及代谢组数据中的生物解释。数据库可在http://bloodexposome.org中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号