首页> 外文期刊>CSI Transactions on ICT >Genome data classification based on fuzzy matching
【24h】

Genome data classification based on fuzzy matching

机译:基于模糊匹配的基因组数据分类

获取原文
获取原文并翻译 | 示例
       

摘要

Genomic data mining and knowledge extraction is an important problem in bioinformatics. Some research work has been done on unknown genome identification and is based on exact pattern matching of n-grams. In most of the real world biological problems exact matching may not give desired results and the problem in using n-grams is exponential explosion. In this paper we propose a method for genome data classification based on approximate matching. The algorithm works by selecting random samples from the genome database. Tolerance is allowed by generating candidates of varied length to query from these sample sequences. The Levenshtein distance is then checked for each candidate and whether they are k-fuzzily equal. The total number of fuzzy matches for each sequence is then calculated. This is then classified using the data mining techniques namely, naive Bayes, support vector machine, back propagation and also by nearest neighbor. Experiment results are provided for different tolerance levels and they show that accuracy increases as tolerance does. We also show the effect of sampling size on the classification accuracy and it was observed that classification accuracy increases with sampling size. Genome data of two species namely Yeast and E. coli are used to verify proposed method.
机译:基因组数据挖掘和知识提取是生物信息学中的重要问题。已经进行了一些有关未知基因组鉴定的研究工作,并且基于n-gram的精确模式匹配。在现实世界中的大多数生物学问题中,精确匹配可能无法提供理想的结果,并且使用n-gram的问题是指数爆炸。在本文中,我们提出了一种基于近似匹配的基因组数据分类方法。该算法通过从基因组数据库中选择随机样本来工作。通过生成各种长度的候选序列以从这些样本序列中进行查询,可以实现公差。然后检查每个候选者的Levenshtein距离,以及它们是否k-模糊相等。然后计算每个序列的模糊匹配总数。然后使用数据挖掘技术将其分类,即朴素贝叶斯,支持向量机,反向传播以及最近邻居。提供了针对不同公差级别的实验结果,这些结果表明,精度随着公差的提高而提高。我们还显示了抽样大小对分类准确性的影响,并且观察到分类准确性随抽样大小而增加。酵母和大肠杆菌这两个物种的基因组数据用于验证所提出的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号