首页> 外文期刊>Bioinformatics >k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage
【24h】

k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage

机译:k-link EST聚类:评估不同连锁度下嵌合序列引入的错误

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

MOTIVATION: The clustering of expressed sequence tags (ESTs) is a crucial step in many sequence analysis studies that require a high level of redundancy. Chimeric sequences, while uncommon, can make achieving the optimal EST clustering a challenge. Single-linkage algorithms are particularly vulnerable to the effects of chimeras. To avoid chimera-facilitated erroneous merges, researchers using single-linkage algorithms are forced to use stringent sequence-similarity thresholds. Such thresholds reduce the sensitivity of the clustering algorithm. RESULTS: We introduce the concept of k-link clustering for EST data. We evaluate how clustering error rates vary over a range of linkage thresholds. Using k-link, we show that Type II error decreases in response to increasing the number of shared ESTs (ie. links) required. We observe a base level of Type II error likely caused by the presence of unmasked low-complexity or repetitive sequence. We find that Type I error increases gradually with increased linkage. To minimize the Type I error introduced by increased linkage requirements, we propose an extension to k-link which modifies the required number of links with respect to the size of clusters being compared. AVAILABILITY: The implementation of k-link is available under the terms of the GPL from http://www.bioinformatics.csiro.au/products.shtml. k-link is licensed under the GNU General Public License, and can be downloaded from http://www.bioinformatics.csiro.au/products.shtml. k-link is written in C++.
机译:动机:表达序列标签(EST)的聚类是许多需要高度冗余的序列分析研究中的关键步骤。嵌合序列虽然不常见,但却使实现最佳EST聚类成为一个挑战。单链接算法特别容易受到嵌合体的影响。为避免嵌合体促成的错误合并,使用单链接算法的研究人员被迫使用严格的序列相似性阈值。这样的阈值降低了聚类算法的敏感性。结果:我们介绍了EST数据的k链接聚类的概念。我们评估聚类错误率如何在一系列链接阈值范围内变化。使用k-link,我们表明II类错误随着增加所需的共享EST(即链接)的数量而减少。我们观察到II型错误的基本水平可能是由未掩盖的低复杂性或重复序列的存在引起的。我们发现随着链接的增加,I类错误逐渐增加。为了最大程度地减少因链接需求增加而导致的I类错误,我们建议对k-link进行扩展,以针对要比较的集群大小修改所需的链接数。可用性:在GPL的条款下,可以从http://www.bioinformatics.csiro.au/products.shtml获得k-link的实现。 k-link已获得GNU通用公共许可证的许可,可以从http://www.bioinformatics.csiro.au/products.shtml下载。 k-link用C ++编写。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号