首页> 外文会议>IEEE SoutheastCon >Characterization of different datasets for ICA algorithms

【24h】

Characterization of different datasets for ICA algorithms

机译：ICA算法的不同数据集的表征

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are several Independent Component Analysis (ICA) algorithms based on different approaches that are used to estimate the independent components, up to some precision, from a linear mixture of independent components as long as the independent components do not follow Gaussian distribution. Some approaches, however, work better than the other if data distribution and characteristics follow a certain pattern. From the mixture of data comprising two or more independent components it is quite hard, if not impossible, to find out the distribution of the independent components accurately and therefore to characterize one or more ICA approaches to be better than others for certain type of data. This paper describes a framework for ICA algorithms as proposed by Ejaz [1]. In this study we have characterized four different ICA algorithms, with some pre-selected fixed parameters, based on different approaches for a number of different datasets that are linearly mixed with two to five independent components that follow a number of different distributions. ICA algorithms used for the research include FastICA, Extended Infomax, JADE, and Kernel ICA based on canonical correlation analysis. All of these algorithms are discussed briefly yet covering most of their important aspects such that this paper also serves as a tutorial for these ICA algorithms for novice readers. We have done an extensive statistical analysis for more than 300 different datasets to characterize them for one or more of the four algorithms based on to which algorithm estimates the independent components closest to the original components.

机译：只要独立分量不遵循高斯分布，就有几种基于不同方法的独立分量分析（ICA）算法可用于从独立分量的线性混合中估算独立分量（达到一定精度）。但是，如果数据分布和特征遵循某种模式，则某些方法会比其他方法更好。从包含两个或多个独立成分的数据混合中，很难，即使不是不可能，也很难准确地找出独立成分的分布，因此很难确定一种或多种ICA方法对于某些类型的数据要优于其他方法。本文描述了由Ejaz [1]提出的ICA算法框架。在这项研究中，我们针对四种不同数据集的不同方法，采用了一些预先选择的固定参数，对四种不同的ICA算法进行了特征化，这些数据集线性混合有遵循不同分布的2至5个独立成分。用于研究的ICA算法包括基于规范相关分析的FastICA，Extended Infomax，JADE和Kernel ICA。简要讨论了所有这些算法，但涵盖了它们的大部分重要方面，因此，本文也为新手读者提供了这些ICA算法的教程。我们已经对300多个不同的数据集进行了广泛的统计分析，以针对四种算法中的一种或多种对它们进行表征，基于该算法，算法估计出最接近原始成分的独立成分。

著录项

来源
《IEEE SoutheastCon 》|2014年|1-8|共8页
会议地点 Lexington KT(US)
作者
Ejaz Masood; Foo Simon Y.; Meyer-Baese Anke; Bernadin Shonda;
展开▼
作者单位

Department of Electrical and Computer Engineering FAMU-FSU College of Engineering Tallahassee FL 32310 USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Abstracts; Extended Infomax; FastICA; Independent Component Analysis; JADE; Kernel ICA; Statistical analysis;

机译：摘要；扩展的Infomax; FastICA;独立成分分析；玉;内核ICA;统计分析;

相似文献

外文文献
中文文献
专利

1. Exploring a graph theory based algorithm for automated identification and characterization of large mesoscale convective systems in satellite datasets [J] . Whitehall Kim, Mattmann Chris A., Jenkins Gregory, Earth Science Informatics . 2015 ,第3期

机译：探索一种基于图论的算法，用于卫星数据集中大型中尺度对流系统的自动识别和表征
2. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms [J] . Lin Kuan-Cheng, Hsieh Yi-Hsiu Journal of medical systems . 2015 ,第10期

机译：基于基于内分泌粒子群优化和人工蜂群算法的混合进化算法的支持向量机支持的医学数据集分类
3. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [J] . Yiyan Zhang, Yi Xin, Qin Li, BioMedical Engineering OnLine . 2017 ,第1期

机译：七种数据挖掘算法对生物医学分类应用数据集不同特征的实证研究
4. Classification Potential vs. Classification Accuracy: A Comprehensive Study of Evolutionary Algorithms with Biomedical Datasets [C] . Ajay Kumar Tanwani, Muddassar Farooq International Workshop on Learning Classifier Systems . 2010

机译：分类潜力与分类准确性：对生物医学数据集的进化算法综合研究
5. Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain [D] . Sadrfaridpour, Ehsan. 2020

机译：用于生物医学域中的大规模数据集的快速机器学习算法
6. Datasets on statistical analysis and performance evaluation of backtracking search optimisation algorithm compared with its counterpart algorithms [O] . Bryar A. Hassan, Tarik A. Rashid 2020

机译：回溯搜索优化算法与其对应算法相比的统计分析和性能评估数据集
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。

Characterization of different datasets for ICA algorithms

摘要

著录项

相似文献

相关主题

期刊订阅