首页> 外国专利> FEATURE VECTOR-BASED METHOD FOR REMOVING REDUNDANCY IN A TRAINING DATASET

FEATURE VECTOR-BASED METHOD FOR REMOVING REDUNDANCY IN A TRAINING DATASET

机译:基于特征向量的训练数据集冗余消除方法

摘要

The present invention is based on the characteristic vector, a method for generating training data, redundant data is removed that, the technical problem to be solved, if not including the unnecessary duplication of data than the conventional method, and on the basis of sequence similarity and may even generate a data set of a larger scale, incorporating features of both the protein and RNA sequences to feature vectors By training data to provide a more effective method for generating, based on the feature vectors to estimate the RNA- binding protein sequences present in the amino acid than the conventional method that uses only features of the protein. For this purpose the training data generated based on the characteristic vector of the present invention according to the present invention method: (1) to determine the RNA- binding amino acids to interact with the RNA and protein to determine the hydrogen bonding -RNA binding site and, (2 ) and calculating the trends in amino acid triplet to interact with the RNA, (3) the various features of the protein and the RNA sequence to predict the RNA- binding protein amino acid in the sequence comprising the steps of: coding the feature vector, ( 4) is characterized in that it comprises a step of building a training data set to remove the redundancy of the coded data based on the feature vector. ;
机译:本发明基于特征向量,一种用于生成训练数据的方法,去除了冗余数据,该冗余数据是要解决的技术问题,如果不包括与传统方法相比不必要的数据重复,则基于序列相似性。甚至可能生成较大规模的数据集,将蛋白质和RNA序列的特征整合到特征向量中;通过训练数据以提供一种更有效的生成方法,基于特征向量来估计存在的RNA结合蛋白序列与仅使用蛋白质功能的常规方法相比,氨基酸具有更大的优势。为此目的,根据本发明的方法,基于本发明的特征载体产生的训练数据是:(1)确定与RNA和蛋白质相互作用的与RNA结合的氨基酸,从而确定与氢结合的-RNA结合位点(2)计算氨基酸三联体与RNA相互作用的趋势,(3)蛋白质和RNA序列的各种特征,以预测序列中与RNA结合的蛋白质氨基酸,包括以下步骤:特征向量(4)的特征在于,它包括建立训练数据集以基于特征向量来去除编码数据的冗余的步骤。 ;

著录项

  • 公开/公告号KR101278211B1

    专利类型

  • 公开/公告日2013-07-01

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR20110100228

  • 发明设计人 최성욱;한경숙;

    申请日2011-09-30

  • 分类号G06F19/10;

  • 国家 KR

  • 入库时间 2022-08-21 16:24:56

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号