Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification

机译：机器学习中工程化特征和学习特征的研究-以文档分类为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document classification is challenging due to handling of voluminous and highly non-linear data, generated exponentially in the era of digitization. Proper representation of documents increases efficiency and performance of classification, ultimate goal of retrieving information from large corpus. Deep neural network models learn features for document classification unlike the engineered feature based approaches where features are extracted or selected from the data. In the paper we investigate performance of different classifiers based on the features obtained using two approaches. We apply deep autoencoder for learning features while engineering features are extracted by exploiting semantic association within the terms of the documents. Experimentally it has been observed that learning feature based classification always perform better than the proposed engineering feature based classifiers.

机译：由于处理在数字化时代以指数方式产生的大量和高度非线性的数据，文档分类具有挑战性。正确表示文档可以提高分类的效率和性能，这是从大型语料库中检索信息的最终目标。深度神经网络模型学习用于文档分类的特征，这不同于从数据中提取或选择特征的基于工程特征的方法。在本文中，我们基于使用两种方法获得的特征来研究不同分类器的性能。我们将深度自动编码器应用于学习功能，而通过利用文档条款内的语义关联来提取工程功能。实验上已经观察到，基于学习特征的分类总是比提出的基于工程特征的分类器表现更好。

著录项

来源
《International conference on quantum interaction》|2017年|161-172|共12页
会议地点
作者
Arpan Sen; Shrestha Ghosh; Debottam Kundu; Debleena Sarkar; Jaya Sil;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Feature extraction; Autoencoder; Restricted Boltzmann Machine; Semantic association; N-gram Model;

机译：深度学习;特征提取;自动编码器受限玻尔兹曼机语义联想; N元模型;

相似文献

外文文献
中文文献
专利

1. Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection [J] . Zheng Lin Chia, Michal Ptaszynski, Fumito Masui, Information Processing & Management . 2021,第4期

机译：基于机器学习和基于特征的讽刺和讽刺分类，应用于网络欺凌检测
2. Machine Learning Based Classification Using Clinical and DaTSCAN SPECT Imaging Features: A Study on Parkinson’s Disease and SWEDD [J] . Rostom Mabrouk, Belkacem Chikhaoui, Layachi Bentabet IEEE Transactions on Radiation and Plasma Medical Sciences . 2019,第2期

机译：使用临床和DaTSCAN SPECT成像功能进行基于机器学习的分类：帕金森氏病和SWEDD的研究
3. Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety [J] . Ha-Neul Yeom, Myunggwon Hwang, Mi-Nyeong Hwang, Journal of Information Science Theory and Practice . 2014,第3期

机译：关于食品安全的韩国推文意图分类的机器学习分类器和特征集选择研究
4. Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification [C] . Arpan Sen, Shrestha Ghosh, Debottam Kundu, International Conference on Intelligent Human Computer Interaction . 2017

机译：机器学习中的工程特征和学习特征研究 - 以文档分类为例
5. Machine Learning with Engineered Features to Identify Fraud in Point-of-Sale Systems [D] . Hines, Christine G. 2020

机译：机器学习与工程特征，以识别销售点系统中的欺诈
6. Study of morphological and textural features for classification of oral squamous cell carcinoma by traditional machine learning techniques [O] . Tabassum Yesmin Rahman, Lipi B. Mahanta, Hiten Choudhury, 2020

机译：传统机器学习技术研究口腔鳞状细胞癌的形态学和纹理特征
7. Machine Learning Approach to Document Classification using Concept based Features [O] . C. Saranya Jothi 2015

机译：基于概念特征的文档分类机器学习方法

Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification

摘要

著录项

相似文献

相关主题

期刊订阅