Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Curtis Huttenhower; Avi I Flamholz; Jessica N Landis; Sauhard Sahi; Chad L Myers; Kellen L Olszewski; Matthew A Hibbs; Nathan O Siemers; Olga G Troyanskaya; Hilary A Coller

首页> 外文期刊>BMC Bioinformatics >Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

【24h】

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

机译：最近的邻居网络：基于基因邻域对表达数据进行聚类

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.

机译：背景技术可以在数百种生物学条件下同时测量数千种基因的微阵列，为了解单个生物途径和细胞的整合运作提供了机会。然而，将如此大量的数据转化为生物学见解仍然是一项艰巨的任务。分析微阵列数据的重要的初始步骤是对具有相似行为的基因进行聚类。通常使用许多经典技术来执行此任务，尤其是分层聚类和K-means聚类，最近已提出了许多新颖的方法。这些方法虽然有用，但并非没有缺点。这些方法可以在纯随机数据中找到聚类，甚至为生物功能而富集的聚类也可以偏向少数过程（例如核糖体）。结果我们开发了最近邻网络（NNN），这是一种基于图的算法，可以生成具有相似表达谱的基因簇。该方法基于从相互最近的邻域生成的交互网络中的重叠派系生成聚类。通过关注最近的邻居而不是绝对距离度量，即使在空间上分隔时，我们也可以捕获具有高度连通性的簇，并且要求相互最近的邻居可以使没有足够相似伴侣的基因保持未聚簇状态。我们将NNN生成的聚类与其他八种聚类方法生成的聚类进行了比较。 NNN特别成功地生成了功能上一致的簇，并且这些簇通常代表比其他方法回收的生物学过程更为广泛的生物过程选择。结论最近邻居网络算法是一种有价值的聚类方法，可以有效地对可能与功能相关的基因进行分组。由于它的简单性，在大型数据集分析中的成功以及以高精度跨越广泛的生物学功能的能力，它特别有吸引力。

著录项

来源
《BMC Bioinformatics》 |2007年第1期|共页
作者
Curtis Huttenhower; Avi I Flamholz; Jessica N Landis; Sauhard Sahi; Chad L Myers; Kellen L Olszewski; Matthew A Hibbs; Nathan O Siemers; Olga G Troyanskaya; Hilary A Coller;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A hybrid genetic algorithm-fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals [J] . Li D., Gu H., Zhang L. Soft computing: A fusion of foundations, methodologies and applications . 2013,第10期

机译：基于最近邻区间的不完全数据聚类的混合遗传算法-模糊c-均值方法
2. Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors [J] . Oleg Okun, Helen Priisalu Artificial intelligence in medicine . 2009,第2a3期

机译：使用k最近邻居的集合进行基于基因表达的癌症分类的数据集复杂性
3. A Modified Method for High Dimensional Data Clustering Based on the Combined Approach of Shared Nearest Neighbor Clustering and Unscented Transform [J] . Ravichandran M, Subramanian K. M, Ganesan P, Journal of computational and theoretical nanoscience . 2018,第6a7期

机译：基于共享最近邻聚类和Unscented变换的组合方法的高维数据聚类修改方法
4. K-Nearest Neighbor (KNN) Analysis on Genes Expression Datasets of Maize Nested Association Mapping (NAM) Showed Confident Classification on Organ-specific Expression [C] . Ika Fitria Widiawati, Husna Nugrahapraja, Rohmatul Fajriyah International Conference on Bioinformatics, Biotechnology, and Biomedical Engineering . 2018

机译：玉米巢式关联图谱（NAM）基因表达数据集的K最近邻（KNN）分析显示器官特异性表达的可信分类
5. Zero-day Attack Identification in Streaming Data: Nearest Neighbor Heuristics and Dynamic Semantic Network Generation in the Spark Eco-system [D] . Pallaprolu, Sai Chaithanya. 2017

机译：流数据中的零日攻击识别：Spark生态系统中的最近邻居启发式算法和动态语义网络生成
6. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods [O] . Curtis Huttenhower, Avi I Flamholz, Jessica N Landis, 2007

机译：最近的邻居网络：基于基因邻域对表达数据进行聚类
7. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods [O] . Curtis Huttenhower, Avi I Flamholz, Jessica N Landis, 2007

机译：最近的邻居网络：基于基因邻域对表达数据进行聚类

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

摘要

著录项

相似文献

相关主题

期刊订阅