Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution

Antonio Rico Sulayes

首页> 外文期刊>Revista de Ingeniería Electrónica, Automática y Comunicaciones >Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution

【24h】

Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution

机译：减少作者归因自动分类中的向量空间维数

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For automatic classification, the implications of having too many classificatory features are twofold. On the one hand, features may not be helpful to discriminate classes and should be removed from the classification. On the other hand, redundant features may produce negative effects as their number grows and their detrimental impact should be minimized or limited. In text classification tasks, where word and word-derived features are commonly employed, the number of distinctive features extracted from text samples can grow fast. For the specific context of authorship attribution, a number of features traditionally used, such as n-grams or word sequences, can produce long lists of distinctive features, a great majority of which have very few instances. Previous research has shown that in this task feature reduction can supersede the performance of noise tolerant algorithms to solve the issues associated with the abundance of classificatory features. However, there has been no attempt to show the motivation of this solution. This article shows how even in the small data collections characteristically used in authorship attribution, the frequency rank of common elements remains stable as their instances accumulate and novel, uncommon words are constantly found. Given this general vocabulary property, present even in very small text collections, the application of techniques to reduce vector space dimensionality is especially beneficial across the various experimental settings typical of this task. The implications of this may be helpful for other automatic classification tasks with similar conditions.

机译：对于自动分类，具有太多分类功能的含义是双重的。一方面，要素可能无助于区分类别，应将其从分类中删除。另一方面，冗余特征可能会随着其数量的增加而产生负面影响，并且应将其有害影响降至最低或限制。在文本分类任务中，通常使用单词和单词衍生的特征，从文本样本中提取的独特特征的数量可以快速增长。对于作者属性的特定上下文，传统上使用的许多功能（例如n-gram或单词序列）都可以生成一长串独特的功能，其中绝大多数很少。先前的研究表明，在此任务中，特征简化可以取代耐噪算法的性能，从而解决与大量分类特征相关的问题。但是，没有尝试显示此解决方案的动机。本文说明，即使在作者属性中常用的少量数据集合中，常见元素的频率等级也随着其实例的积累和不断发现的新颖而罕见的单词而保持稳定。鉴于即使在很小的文本集中也具有这种一般的词汇属性，减少矢量空间维数的技术的应用在此任务典型的各种实验设置中特别有益。这对于具有类似条件的其他自动分类任务可能会有所帮助。

著录项

来源
《Revista de Ingeniería Electrónica, Automática y Comunicaciones》 |2017年第3期|共10页
作者
Antonio Rico Sulayes;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化系统;
关键词

相似文献

外文文献
中文文献
专利

1. A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces [J] . Tadesse Dawit G., Carpenter Mark Advances in data analysis and classification . 2019,第2期

机译：一种选择奇异矢量空间中高维分类相关尺寸的方法
2. New Three-Dimensional Space Vector-Based Switching Signal Generation Technique Without Null Vectors and With Reduced Switching Losses for a Grid-Connected Four-Leg Inverter [J] . Golwala Heli, Chudamani R. Power Electronics, IEEE Transactions on . 2016,第2期

机译：并网四脚逆变器基于零空间矢量的新型基于三维空间矢量的开关信号生成技术
3. Hybrid three-dimensional and support vector machine approach for automatic vehicle tracking and classification using a single camera [J] . Kachach Redouane, Maria Canas Jose Journal of electronic imaging . 2016,第3期

机译：混合三维和支持向量机方法，使用单个摄像头自动跟踪和分类车辆
4. Robust classification using support vector machine in low-dimensional manifold space for automatic target recognition [C] . Shuowen Hu, Heesung Kwon, Rao R. Applied Imagery Pattern Recognition Workshop (AIPR), 2011 IEEE . 2011

机译：使用支持向量机在低维流形空间中进行稳健分类以自动识别目标
5. Algorithm Analysis, Code Analysis, Code Vectorization, and Code Parallelization of a Fast Finite Difference Method for Space Fractional Diffusion Equations in Three Space Dimensions. [D] . Swartz, Matthew. 2015

机译：快速有限差分方法在三个空间维度上的空间分式扩散方程的算法分析，代码分析，代码矢量化和代码并行化。
6. Automatic classification of protein structures using low-dimensional structure space mappings [O] . Daniel Asarnow, Rahul Singh 2014

机译：使用低维结构空间映射自动分类蛋白质结构
7. On the classification of endomorphisms on infinite-dimensional vector spaces [O] . Fernando Pablos Romo 2020

机译：关于无限尺寸矢量空间的子宫内骨骺的分类

Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution

摘要

著录项

相似文献

相关主题

期刊订阅