首页>
外国专利>
FEATURE VECTOR-BASED METHOD FOR REMOVING REDUNDANCY IN A TRAINING DATASET
FEATURE VECTOR-BASED METHOD FOR REMOVING REDUNDANCY IN A TRAINING DATASET
展开▼
机译:基于特征向量的训练数据集冗余消除方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention is based on the characteristic vector, a method for generating training data, redundant data is removed that, the technical problem to be solved, if not including the unnecessary duplication of data than the conventional method, and on the basis of sequence similarity and may even generate a data set of a larger scale, incorporating features of both the protein and RNA sequences to feature vectors By training data to provide a more effective method for generating, based on the feature vectors to estimate the RNA- binding protein sequences present in the amino acid than the conventional method that uses only features of the protein. For this purpose the training data generated based on the characteristic vector of the present invention according to the present invention method: (1) to determine the RNA- binding amino acids to interact with the RNA and protein to determine the hydrogen bonding -RNA binding site and, (2 ) and calculating the trends in amino acid triplet to interact with the RNA, (3) the various features of the protein and the RNA sequence to predict the RNA- binding protein amino acid in the sequence comprising the steps of: coding the feature vector, ( 4) is characterized in that it comprises a step of building a training data set to remove the redundancy of the coded data based on the feature vector. ; 展开▼