A Two-Stage Machine learning approach for temporally-robust text classification

Salles Thiago; Rocha Leonardo; Motirao Fernando; Goncalves Marcos; Viegas Felipe; Meira Wagner Jr.

首页> 外文期刊>Information Systems >A Two-Stage Machine learning approach for temporally-robust text classification

【24h】

A Two-Stage Machine learning approach for temporally-robust text classification

机译：鲁棒性文本分类的两阶段机器学习方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the most relevant research topics, in Information Retrieval. is Automatic Document Classification (ADC). Several ADC algorithms have been proposed in the literature. However, the majority of these algorithms assume that the underlying data distribution does not change over time. Previous work has demonstrated evidence of the negative impact of three main temporal effects in representative datasets textual datasets; reflected by variations observed over time in the class distribution, in the pairwise class similarities and in the relationships between terms and classes [1]. In order to minimize the impact of temporal effects in ADC algorithms, we have previouly introduced the notion of a temporal weighting function (TWF), which reflects the varying nature of textual datasets. We have also proposed a procedure to derive the TWF's expression and parameters. However, the derivation of the TWF requires the running of explicit and complex statistical tests, which are very cumbersome or can not even be run in several cases. In this article, we propose a machine learning methodology to, automatically learn the TWF without the need to perform any statistical tests. We also propose new strategies to incorporate the TWF into ADC algorithms, which we call temporally-aware classifiers. Experiments showed that the fully-automated temporally-aware classifiers achieved significant gains (up to 17%) when compared to their non-temporal counterparts, even outperforming some state-of-the-art algorithms (e.g., SVM) in most cases, with large reductions in execution time. (C) 2017 Elsevier Ltd. All rights reserved.

机译：最相关的研究主题之一，在信息检索中。是自动文档分类（ADC）。文献中已经提出了几种ADC算法。但是，这些算法大多数都假定基础数据分布不会随时间变化。先前的工作证明了代表性数据集文本数据集中的三个主要时间效应的负面影响。在类别分布，成对的类别相似性以及术语与类别之间的关系中观察到的随时间变化的结果[1]。为了最大程度地减少时间影响在ADC算法中的影响，我们以前引入了时间加权函数（TWF）的概念，该概念反映了文本数据集的不同性质。我们还提出了导出TWF的表达式和参数的过程。但是，TWF的推导需要运行显式和复杂的统计检验，这非常麻烦，甚至在某些情况下甚至无法运行。在本文中，我们提出了一种机器学习方法，无需执行任何统计测试即可自动学习TWF。我们还提出了将TWF纳入ADC算法的新策略，我们将这些称为时间感知分类器。实验表明，与非时间分类器相比，全自动的时间感知分类器获得了可观的收益（高达17％），即使在大多数情况下，其性能也优于某些最新算法（例如SVM）大大减少了执行时间。（C）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Information Systems》 |2017年第9期|40-58|共19页
作者
Salles Thiago; Rocha Leonardo; Motirao Fernando; Goncalves Marcos; Viegas Felipe; Meira Wagner Jr.;
展开▼
作者单位

Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil;

Univ Fed Sao Joao Del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil;

Univ Fed Sao Joao Del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil;

Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil;

Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil;

Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Automatic docundent classification; Temporal weighting function; Fully-Automated machine learning process;

机译：自动下达分类;时间加权功能;全自动机器学习过程;
入库时间 2022-08-18 02:47:41

相似文献

外文文献
中文文献
专利

1. Extracting and reusing blocks of knowledge in learning classifier systems for text classification: a lifelong machine learning approach [J] . Arif Muhammad Hassan, Iqbal Muhammad, Li Jianxin Soft computing: A fusion of foundations, methodologies and applications . 2019,第23期

机译：在学习分类系统中提取和重用知识块进行文本分类：终身机器学习方法
2. Comparative Study of Machine Learning Approach on Malay Translated Hadith Text Classification based on Sanad [J] . Syuhairah Rahifah Mohammad Najib, Nurazzah Abd Rahman, Normaly Kamal Ismail, MATEC Web of Conferences . 2017,第1期

机译：基于Sanad的马来语翻译的机器学习方法的比较研究
3. A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances [J] . Han Liu, Pete Burnap, Wafa Alorainy, Computational Social Systems, IEEE Transactions on . 2019,第2期

机译：歧义实例两阶段训练的文本分类模糊方法
4. Automatic Classification for Cognitive Engagement in Online Discussion Forums: Text Mining and Machine Learning Approach [C] . Hind Hayati, Mohammed Khalidi Idrissi, Samir Bennani International Conference on Artificial Intelligence in Education . 2020

机译：在线讨论论坛中的认知参与的自动分类：文本挖掘和机器学习方法
5. An efficient approach to machine learning based text classification through distributed computing [D] . Immaneni, Raghu Nandan. 2015

机译：通过分布式计算进行基于机器学习的文本分类的有效方法
6. Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature [O] . Rong Xu, QuanQiu Wang 2015

机译：从大规模文本医学生物医学文献中大规模提取药物副作用关系时将知识驱动方法与有监督的机器学习方法进行比较
7. Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach [O] . Musarrat Hussain, Jamil Hussain, Taqdir Ali, 2021

机译：采用机器学习辅助模式的方法在临床实践指导方针中的文本分类

A Two-Stage Machine learning approach for temporally-robust text classification

摘要

著录项

相似文献

相关主题

期刊订阅