...
首页> 外文期刊>BMC Bioinformatics >Large-scale labeling and assessment of sex bias in publicly available expression data
【24h】

Large-scale labeling and assessment of sex bias in publicly available expression data

机译:在公开的表达数据中大规模标记和评估性别偏见

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2–5%). Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.
机译:临床相关不良药物事件的风险较高的妇女的风险超过1.5倍。虽然这种较高的流行率部分是由于性别相关的效果,但生物性别差异可能也会影响药物反应。公开可用的基因表达数据库提供了在细胞水平下检查药物反应的独特机会。然而,元数据的缺失和异质性可以防止大规模鉴定药物暴露研究和性偏见的限制评估。为了解决这个问题,我们培训了特定于有机体的模型,以从基因表达数据推断出样本性,以及使用实体归一化以将元数据细胞系和药物提示映射到现有的本体。使用这种方法,我们推断出450,371人和245,107小鼠微阵列和RNA-SEQ样品的性别标签。总体而言,我们在人类样品中发现轻微的女性偏见(52.1%)和(62.5%)小鼠样品中的雄性偏差;这对应于人类的大多数混合性别研究和小鼠中的单一性研究,在仅女性和男性和男性中的25.8%和18.9%和21.6%的小鼠中的21.6%之间分别分别分配。在毒品研究中,我们发现了整体性别抽样偏见的有限证据;然而,特定类别的药物,包括人类癌症和小鼠神经系统药物,分别富含女性和仅对男性的研究。我们利用基于表达的性标签来进一步检查细胞系性别的复杂性,并评估元数据性别标签缺陷(2-5%)的频率。我们的结果展示了有限的整体性别偏见,同时突出了特定子场的高偏见,并强调了包括性标签,以更好地理解潜在生物学的重要性。我们制作推断和标准化的标签,以及用于缺氧样本的标志,公开可用于催化在未来分析中作为研究变量作为研究变量的常规使用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号