ALE: automated label extraction from GEO metadata

Cory B. Giles; Chase A. Brown; Michael Ripperger; Zane Dennis; Xiavan Roopnarinesingh; Hunter Porter; Aleksandra Perz; Jonathan D. Wren

首页> 外文期刊>BMC Bioinformatics >ALE: automated label extraction from GEO metadata

【24h】

ALE: automated label extraction from GEO metadata

机译：ALE：从GEO元数据中自动提取标签

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

NCBI’s Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.

机译：NCBI的基因表达综合（GEO）是一个丰富的社区资源，其中包含来自人类，小鼠，大鼠和其他模型生物的数百万个基因表达实验。但是，有关每个实验（元数据）的信息均采用保存者提供的开放式，非标准化文本描述的格式。因此，在不给实验分配标签的情况下，按性别，样品供体的年龄和起源组织等因素对荟萃分析进行实验分类是不可行的。为此，首选自动化方法，主要是因为要处理的数据的大小和量，而且还因为它确保了标准化和一致性。尽管其中一些标签可以直接从文本元数据中提取，但许多可用数据并不包含明确的文本，这些文本会告知研究人员该研究对象的年龄和性别。为了弥合这一差距，可以训练机器学习方法，以使用与文本标签关联的基因表达模式来完善标签预测的置信度。我们的分析显示，只有26％的元数据文本包含有关性别的信息，而21％的年龄有关。为了缓解这些数据集缺少可用标签的问题，我们首先从每个GEO RNA数据集的文本元数据中提取了标签，然后根据手动整理标签的黄金标准评估了性能。然后，我们基于样本的基因表达，使用机器学习方法来预测标签，并将其与基于文本的方法进行比较。在这里，我们提出了一种自动方法，该方法使用启发式方法和机器学习方法从文本元数据和GEO数据中提取年龄，性别和组织的标签。我们展示了这两种方法一起提高了对GEO样本的标签分配的准确性。

著录项

来源
《BMC Bioinformatics》 |2017年第14期|共页
作者
Cory B. Giles; Chase A. Brown; Michael Ripperger; Zane Dennis; Xiavan Roopnarinesingh; Hunter Porter; Aleksandra Perz; Jonathan D. Wren;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. An automated framework for the extraction of semantic legal metadata from legal texts [J] . Sleimi Amin, Sannier Nicolas, Sabetzadeh Mehrdad, Empirical Software Engineering . 2021,第3期

机译：来自法律文本的语义法律元数据的自动框架
2. Extraction of CT dose information from DICOM metadata: Automated matlab-based approach [J] . AJR: American Journal of Roentgenology : Including Diagnostic Radiology, Radiation Oncology, Nuclear Medicine, Ultrasonography and Related Basic Sciences . 2013,第1期

机译：从DICOM元数据中提取CT剂量信息：基于Matlab的自动化方法
3. Automated Content Metadata Extraction Services Based on MPEG Standards [J] . D.C. Gibbon, Z. Liu, A. Basso, The Computer journal . 2013,第5期

机译：基于MPEG标准的自动内容元数据提取服务
4. Automated substrate resistance extraction in nanoscale VLSI by exploiting a geometry-based analytical model [C] . Bontzios Yiorgos I., Dimopoulos Michael G., Hatzopoulos Alkis A. Proceedings of the 18th International Conference Mixed Design of Integrated Circuits and Systems . 2011

机译：利用基于几何的分析模型在纳米级VLSI中自动提取衬底电阻
5. Satellite mapping and automated feature extraction: Geographic information system-based change detection of the Antarctic coast. [D] . Kim, Kee-Tae. 2004

机译：卫星测绘和自动特征提取：基于地理信息系统的南极海岸变化检测。
6. ALE: automated label extraction from GEO metadata [O] . Cory B. Giles, Chase A. Brown, Michael Ripperger, 2017

机译：ALE：从GEO元数据中自动提取标签
7. Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts [O] . Barnickel, Thorsten, Weston, Jason, Collobert, Ronan, 2009

机译：基于神经网络的语义角色标记在生物医学文本自动关系提取中的大规模应用

ALE: automated label extraction from GEO metadata

摘要

著录项

相似文献

相关主题

期刊订阅