首页> 外文期刊>Bioinformatics >Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices
【24h】

Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices

机译:将点分类为邻域(SPIN):通过对距离矩阵进行排序来进行数据分析和可视化

获取原文
获取原文并翻译 | 示例
       

摘要

We introduce a novel unsupervised approach for the organization and visualization of multidimensional data. At the heart of the method is a presentation of the full pairwise distance matrix of the data points, viewed in pseudocolor. The ordering of points is iteratively permuted in search of a linear ordering, which can be used to study embedded shapes. Several examples indicate how the shapes of certain structures in the data (elongated, circular and compact) manifest themselves visually in our permuted distance matrix. It is important to identify the elongated objects since they are often associated with a set of hidden variables, underlying continuous variation in the data. The problem of determining an optimal linear ordering is shown to be NP-Complete, and therefore an iterative search algorithm with O(n(3)) step-complexity is suggested. By using sorting points into neighborhoods, i.e. SPIN to analyze colon cancer expression data we were able to address the serious problem of sample heterogeneity, which hinders identification of metastasis related genes in our data. Our methodology brings to light the continuous variation of heterogeneity-starting with homogeneous tumor samples and gradually increasing the amount of another tissue. Ordering the samples according to their degree of contamination by unrelated tissue allows the separation of genes associated with irrelevant contamination from those related to cancer progression.
机译:我们为多维数据的组织和可视化引入了一种新颖的无监督方法。该方法的核心是以伪彩色显示数据点的完整成对距离矩阵。点的顺序是迭代排列的,以寻找线性顺序,该线性顺序可用于研究嵌入的形状。几个示例说明了数据中某些结构的形状(伸长的,圆形的和紧凑的)如何在我们排列的距离矩阵中直观地体现出来。识别细长对象很重要,因为它们通常与一组隐藏变量关联,这些变量是数据连续变化的基础。确定最佳线性顺序的问题显示为NP完全,因此建议使用O(n(3))步复杂性的迭代搜索算法。通过使用分类点到邻域(即SPIN)来分析结肠癌表达数据,我们能够解决样本异质性的严重问题,这阻碍了我们数据中转移相关基因的鉴定。我们的方法揭示了异质性的连续变化,从均一的肿瘤样品开始,逐渐增加了另一组织的数量。根据不相关组织的污染程度对样品进行排序,可以将与无关污染相关的基因从与癌症进展相关的基因中分离出来。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号