Data reductionism is a key area of focus across many a discipline owing to the massive amount of data available nowadays. Raw Data which is mostly redundant is expressed in terms of fewer entities called factors. Factor Analysis is a method of reducing data dimensionality by grouping and combining variables, expressing them in a highly structured manner without losing the information value of data. Boolean Factor Analysis is a method of representing binary data in terms of its principal factors. This thesis addresses two entirely different approaches, their methodologies and implementation in performing Boolean factor analysis.;The first of these approaches is a novel method of matrix decomposition that uniquely decomposes a given data matrix into successive smaller matrices entailing a fixed number of factors in the process. An n x m binary data matrix I is expressed as a boolean product A ∘ B of n x k binary matrix A and k x m binary matrix B, keeping factors k minimal. I is known as an object-attribute matrix, while A and B are the object-factor and factor-attribute matrices respectively. The approach is built on a theorem and accompanying approximation algorithm which finds optimal decompositions based on the framework provided by the theorem.;The second approach is based on a modified hopfield neural network that uses correlational hebbian learning and the native nature of recurrent networks in identifying factors of a given data set. The original data whose factors are to be found are mapped into the space of factors allowing the hopfield-like network to find them. Due to hebbian learning, neurons belonging to one common factor tend to be more correlated than the other neurons and fire collectively when the factor is found, thus constituting attractors of network dynamics. The hopfield network follows a step by step procedure in the factor search, building attraction basins proceeding towards convergence and ultimately factor revelation.;Both approaches albeit oriented towards factor analysis follow entirely different methodologies with some parallels along the way, making for a comparative research study which this thesis comprises of. Data sets, both real and artificial are tested for either approach and compared in order to validate Boolean Factor Analysis
展开▼
机译:由于当今可用的海量数据,数据减少主义是许多学科重点关注的关键领域。原始数据(通常是多余的)用较少的称为因素的实体表示。因子分析是一种通过分组和组合变量,以高度结构化的方式表示变量而又不丢失数据信息价值的方法来降低数据维数的方法。布尔因子分析是一种根据其主要因子表示二进制数据的方法。本文研究了两种完全不同的方法,它们的方法和执行布尔因子分析的实现。;这些方法中的第一种是一种新颖的矩阵分解方法,它可以将给定的数据矩阵唯一分解为连续的较小矩阵,从而需要在矩阵中固定数量的因子。处理。 n x m二进制数据矩阵I表示为布尔乘积A∘。 n x k个二进制矩阵A和k xm个二进制矩阵B的B,使因子k最小。我被称为对象属性矩阵,而A和B分别是对象因子矩阵和因子属性矩阵。该方法基于一个定理和伴随的近似算法,该算法基于该定理提供的框架找到最佳分解。第二种方法基于改进的Hopfield神经网络,该网络使用相关的hebbian学习和递归网络的本性来识别给定数据集的因素。将要查找其因子的原始数据映射到因子空间中,从而使类似Hopfield的网络可以找到它们。由于hebbian学习,属于一个公共因子的神经元往往比其他神经元具有更多的相关性,并且在发现该因子时会集体放电,从而构成网络动力学的吸引者。 Hopfield网络在因子搜索中遵循了分步过程,建立了吸引盆地,最终趋于收敛,最终实现了因子揭示。这两种方法虽然都针对因子分析,但遵循的方法却完全不同,有些相似之处,因此进行了比较研究。本论文包括的内容。测试这两种方法的真实数据和人工数据集并进行比较,以验证布尔因子分析
展开▼