Resampling-Based Similarity Measures for High-Dimensional Data

AmaratungaDhammika; CabreraJavier; LeeYung-Seop

首页> 外文期刊>Journal of computational biology >Resampling-Based Similarity Measures for High-Dimensional Data

【24h】

Resampling-Based Similarity Measures for High-Dimensional Data

机译：基于重采样的高维数据相似性度量

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract An important issue in classification is the assessment of sample similarity. This is nontrivial in high-dimensional or megavariate datasets—datasets that are comprised of simultaneous measurements on thousands of features, many of which carry little or no information regarding consistent sample differences. Conventional similarity measures do not work particularly well for such data. As an alternative, we propose a distance measure that is based on a refiltering process: at each step of the process a random subset of features is selected and a cluster analysis is performed using only this subset; the relative frequency with which a pair of samples clusters together across several such random subsets forms the similarity measure. The features chosen at any step may be completely random or enriched by awarding the more informative features a higher chance of selection; this enrichment turns out to be particularly effective. We use actual datasets from the burgeoning genomics literature to demonstra..." /> rel="meta" type="application/atom+xml" href="http://dx.doi.org/10.1089%2Fcmb.2014.0195" /> rel="meta" type="application/rdf+json" href="http://dx.doi.org/10.1089%2Fcmb.2014.0195" /> rel="meta" type="application/unixref+xml" href="http://dx.doi.org/10.1089%2Fcmb.2014.0195" /> 展开▼

机译：摘要分类中的一个重要问题是样本相似性的评估。在高维或大型变量数据集中，这是不平凡的-数据集由对数千个特征的同时测量组成，其中许多特征很少或没有关于一致样本差异的信息。常规相似性度量不适用于此类数据。作为替代方案，我们提出一种基于重新过滤过程的距离度量：在过程的每个步骤中，选择特征的随机子集，仅使用该子集执行聚类分析;一对样本跨几个这样的随机子集聚在一起的相对频率形成相似性度量。在任何步骤选择的特征可能是完全随机的，也可以通过授予更多信息来选择更多机会来丰富。事实证明，这种浓缩特别有效。我们使用来自新兴基因组学文献的实际数据集进行演示...“ /> <元名称=” dc.Identifier“ scheme =” publisher-id“ content =” 10.1089 / cmb.2014.0195“ /> <元名称=” dc.Identifier“ scheme =” doi“ content =” 10.1089 / cmb.2014.0195“ /> rel =” meta“ type =” application / atom + xml“ href =” http://dx.doi.org/10.1089%2Fcmb.2014.0195“ /> <链接rel =“ meta” ty pe =“ application / rdf + json” href =“ http://dx.doi.org/10.1089%2Fcmb.2014.0195” /> rel =“ meta” type =“ application / unixref + xml” href =“ http ：//dx.doi.org/10.1089%2Fcmb.2014.0195“ /> <元名称=” MSSmartTagsPreventParsing“ content =” true

著录项

来源
《Journal of computational biology》 |2015年第1期|共页
作者
AmaratungaDhammika; CabreraJavier; LeeYung-Seop;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类分子生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Resampling-Based Similarity Measures for High-Dimensional Data [J] . Amaratunga Dhammika, Cabrera Javier, Lee Yung-Seop Journal of computational biology: A journal of computational molecular cell biology . 2015,第1期

机译：基于重采样的高维数据相似性度量
2. Resampling-Based Analysis of Multivariate Data and Repeated Measures Designs with the R Package MANOVA.RM [J] . Sarah Friedrich, Frank Konietschke, Markus Pauly The R Journal . 2019,第2期

机译：基于重采样的多变量数据分析和R包Manova.rm的重复测量设计
3. Pivot-based approximate k-NN similarity joins for big high-dimensional data [J] . Cech Premysl, Lokoc Jakub, Silva Yasin N. Information Systems . 2020,第Jana期

机译：基于枢轴的近似k-NN相似度连接可处理大型高维数据
4. Similarity Histogram Estimation Based Top-k Similarity Join Algorithm on High-Dimensional Data [C] . Youzhong Ma, Ruiling Zhang, Yongxin Zhang International conference on web information systems and applications . 2019

机译：高维数据基于相似直方图估计的Top-k相似连接算法
5. High-Dimensional Similarity Search for Large Datasets. [D] . Dong, Wei. 2011

机译：大数据集的高维相似性搜索。
6. Comparison of beta diversity measures in clustering the high-dimensional microbial data [O] . Biyuan Chen, Xueyi He, Bangquan Pan, 2021

机译：β多样性测量对聚类高维微生物数据的比较
7. A Topology-Independent Similarity Measure for High-Dimensional Feature Spaces [O] . Jochen Kerdels, Gabriele Peters 2013

机译：一种与拓扑无关的高维特征空间相似性度量

Resampling-Based Similarity Measures for High-Dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅