首页> 外文期刊>Data in Brief >Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
【24h】

Genome-wide hairpins datasets of animals and plants for novel miRNA prediction

机译:动植物全基因组发夹数据集,用于新型miRNA预测

获取原文
       

摘要

This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of:Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegansandDrosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) “positive”: meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) “unlabeled”: indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/.
机译:本文提供了几个全基因组数据集,可用于训练microRNA(miRNA)分类器。可用的发夹序列来自以下基因组:智人,拟南芥,冈比亚按蚊,秀丽隐杆线虫和黑腹果蝇的基因组。每个数据集提供的基因组数据分为序列和一组用于预测的计算特征。每个序列都有一个标记:i)“阳性”:根据miRBase v21,它是众所周知的pre-miRNA;或ii)“未标记的”:表明该序列尚不具有已知功能,并且可能是新型pre-miRNA的可能候选者。由于选择一个有意义的特征集对于一个良好的pre-miRNA分类器非常重要的事实,已经计算出具有较大判别力的代表性特征集,并且还为每个基因组提供了该特征集。该功能集包含有关序列,拓扑和结构的典型信息。数据集已在https://sourceforge.net/projects/sourcesinc/files/mirdata/中公开共享。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号