Mining Unique-m Substrings from Genomes

机译：从基因组中挖掘Quots-M个子串

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unique substrings in genomes may indicate high level of specificity which is crucial and fundamental to many genetics studies, such as PCR, microarray hybridization, Southern and Northern blotting, RNA interference (RNAi), and genome (re)sequencing. However, being unique sequence in the genome alone is not adequate to guaranty high specificity. For example, nucleotides mismatches within a certain tolerance may impair specificity even if an interested substring occur only once in the genome. In this study we propose the concept of unique-m substrings of genomes for controlling specificity in genome-wide assays. A unique-m substring is defined if it only has a single perfect match on one strand of the entire genome while all other approximate matches must have more than m mismatches. We developed a pattern growth approach to systematically mine such unique-m substrings from a given genome. Our algorithm does not need a pre-processing step to extract sequential information which is required by most of other rival methods. The search for unique-m substrings from genomes is performed as a single task of regular data mining so that the similarities among queries are utilized to achieve tremendous speedup. The runtime of our algorithm is linear to the sizes of input genomes and the length of unique-m substrings. In addition, the unique-m mining algorithm has been parallelized to facilitate genome-wide computation on a cluster or a single machine of multiple CPUs with shared memory.

著录项

期刊名称 other
作者
Kai Ye; Zhenyu Jia; Yipeng Wang; Paul Flicek; Rolf Apweiler;
展开▼
作者单位

展开▼
年(卷),期 -1(3),3
年度 -1
页码 099–103
总页数 12
原文格式 PDF
正文语种
中图分类
关键词
Data mining Genomes Mismatch Sequence;

机译：数据挖掘;基因组;错配;序列;

相似文献

外文文献
中文文献
专利

1. Using Frequent Substring Mining Techniques for Indexing Genome Sequences: A Comparison of Frequent Substring and Frequent Max Substring Algorithms [J] . Todsanai Chumwatana Journal of Advances in Information Technology . 2016,第4期

机译：使用频繁子串挖掘技术为基因组序列建立索引：频繁子串算法和最大最大子串算法的比较
2. Genome comparison without alignment using shortest unique substrings [J] . Bernhard Haubold, Nora Pierstorff, Friedrich M?ller, BMC Bioinformatics . 2005,第1期

机译：基因组比较，无需使用最短的唯一子字符串进行比对
3. Genome mining of the Streptomyces avermitilis genome and development of genome-minimized hosts for heterologous expression of biosynthetic gene clusters [J] . Ikeda Haruo, Shin-ya Kazuo, Omura Satoshi Journal of industrial microbiology & biotechnology . 2014,第2期

机译：阿维链霉菌基因组的基因组挖掘和基因组最小化宿主的开发，用于生物合成基因簇的异源表达
4. Genome sequence clustering using hybrid method: Self-organizing map and frequent max substring techniques [C] . Chumwatana Todsanai International Conference on Machine Learning and Cybernetics . 2013

机译：使用混合方法的基因组序列聚类：自组织图和频繁最大子串技术
5. Genome Mining in Actinobacteria Via a Hybrid-omics Discovery Platform [D] . Tryon, James Hudson. 2020

机译：通过Hybrid-Omics发现平台在Actinobacteria中的基因组开采
6. Genome comparison without alignment using shortest unique substrings [O] . Bernhard Haubold, Nora Pierstorff, Friedrich Möller, 2005

机译：基因组比较无需使用最短的唯一子字符串进行比对
7. Mining Unique-m Substrings from Genomes [O] . Kai Ye, Zhenyu Jia, Yipeng Wang, 2010

机译：从基因组中挖掘Quots-M个子串

Mining Unique-m Substrings from Genomes

摘要

著录项

相似文献

相关主题

期刊订阅