首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Trace-Oriented Feature Analysis for Large-Scale Text Data Dimension Reduction
【24h】

Trace-Oriented Feature Analysis for Large-Scale Text Data Dimension Reduction

机译:大规模文本数据降维的基于轨迹的特征分析

获取原文
获取原文并翻译 | 示例

摘要

Dimension reduction for large-scale text data is attracting much attention nowadays due to the rapid growth of the World Wide Web. We can categorize those popular dimension reduction algorithms into two groups: feature extraction and feature selection algorithms. In the former, new features are combined from their original features through algebraic transformation. Though many of them have been validated to be effective, these algorithms are typically associated with high computational overhead, making them difficult to be applied on real-world text data. In the latter, subsets of features are selected directly. These algorithms are widely used in real-world tasks owing to their efficiency, but are often based on greedy strategies rather than optimal solutions. An important problem remains: it has been troublesome to integrate these two types of algorithms into a single framework, making it difficult to reap the benefits from both. In this paper, we formulate the two algorithm categories through a unified optimization framework, under which we develop a novel feature selection algorithm called Trace-Oriented Feature Analysis (TOFA). In detail, we integrate the objective functions of several state-of-the-art feature extraction algorithms into a unified one under the optimization framework, and then we propose to optimize this objective function in the solution space of feature selection algorithms for dimensionality reduction. Since the proposed objective function of TOFA integrates many prominent feature extraction algorithms' objective functions, such as unsupervised Principal Component Analysis (PCA) and supervised Maximum Margin Criterion (MMC), TOFA can handle both supervised and unsupervised problems. In addition, by tuning a weight value, TOFA is also suitable to solve semisupervised learning problems. Experimental results on several real-world data sets validate the effectiveness and efficiency of TOFA in text data for dimensionality reduction purpose.
机译:由于万维网的快速发展,如今大规模文本数据的降维已引起了广泛的关注。我们可以将那些流行的降维算法分为两类:特征提取和特征选择算法。在前者中,通过代数变换将新特征与原始特征组合在一起。尽管其中许多算法已经过验证是有效的,但这些算法通常与较高的计算开销相关联,从而使其难以应用于实际文本数据。在后者中,直接选择特征子集。这些算法由于效率高而被广泛用于现实世界中的任务,但通常基于贪婪策略而不是最优解决方案。一个重要的问题仍然存在:将这两种算法集成到一个框架中一直很麻烦,因此很难从这两种算法中受益。在本文中,我们通过统一的优化框架制定了两种算法类别,在此框架下,我们开发了一种新颖的特征选择算法,称为“面向痕迹的特征分析(TOFA)”。详细地讲,我们在优化框架下将几种最新的特征提取算法的目标函数集成到一个统一的函数中,然后提出在特征选择算法的求解空间中优化该目标函数以进行降维。由于建议的TOFA目标函数集成了许多突出的特征提取算法的目标函数,例如无监督主成分分析(PCA)和有监督最大保证金标准(MMC),因此TOFA可以处理有监督和无监督问题。此外,通过调整权重值,TOFA也适合解决半监督学习问题。在多个实际数据集上的实验结果验证了TOFA在文本数据中的有效性和效率,以降低维度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号