首页> 外国专利> FEATURE VECTOR-BASED METHOD FOR REMOVING REDUNDANCY IN A TRAINING DATASET

FEATURE VECTOR-BASED METHOD FOR REMOVING REDUNDANCY IN A TRAINING DATASET

机译:基于特征向量的训练数据集冗余消除方法

摘要

The present invention is based on characteristic vectors, a kind of method, for generating training data, redundant data is removed that, the technical failure being solved, if it were not for conventional method is compared in the unnecessary repetition for including data, a data set of a larger scale can be generated on the basis of sequence similarity and even, it include to the protein of feature vectors and the feature of RNA sequence to provide the more effective methods for generation by train data, the rna binding protein sequence that is present in amino acid is estimated than conventional method based on feature vectors, it uses unique feature of protein. Train data is according to the present invention based on characteristic vector production method of the invention for this purpose: (1) determine RNA combination amino acid interaction RNA and protein determine Hydrogenbond-RNA binding sites and, (2) and trend and RNA in amino acid triad is calculated, (3) in the sequence predict rna binding protein amino acid protein and RNA sequence various functions interaction the following steps are included:: feature vector is encoded, (4) it is characterized in that, it includes: the step for establishing redundancy of the training data set to remove the coded data based on feature vector.
机译:本发明基于特征向量,一种用于生成训练数据的方法,去除了冗余数据,该冗余数据解决了技术故障,如果不是传统方法,则在不必要的重复中比较包括数据在内的数据。序列相似性的基础上可以生成更大范围的序列,甚至包括特征载体的蛋白质和RNA序列的特征,以提供更有效的方法来通过训练数据生成rna结合蛋白序列氨基酸中存在的蛋白质比基于特征载体的常规方法估计的要多,它利用了蛋白质的独特特征。为此目的,根据本发明,基于本发明的特征载体产生方法,训练数据:(1)确定RNA组合氨基酸相互作用RNA和蛋白质确定氢键-RNA结合位点,以及(2)氨基中的趋势和RNA计算酸性三联体,(3)在序列中预测rna结合蛋白氨基酸蛋白与RNA序列相互作用的各种功能,包括以下步骤:编码特征载体,(4)其特征在于,包括:用于建立训练数据集的冗余以基于特征向量来去除编码数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号