Subword Semantic Hashing for Intent Classification on Small Datasets

机译：小数据集意图分类的子词语义散列

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods [11], [13], [14] are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise with the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: Chatbot, Ask Ubuntu, and Web Applications [3]. Our benchmarks are available online.¹

机译：在本文中，我们介绍了使用语义散列作为意图分类任务的嵌入方法，并在三个常用基准上实现了最新的性能。对于基于数据的最先进的深度学习系统，在小型数据集上进行意图分类是一项艰巨的任务。语义散列是一种尝试克服这种挑战并学习可靠的文本分类的尝试。当前基于词嵌入的方法[11]，[13]，[14]依赖于词汇表。这种方法的主要缺点之一是词汇不足，尤其是在训练数据集较小且词汇量较大的情况下。聊天机器人的意图分类就是这种情况，通常从互联网通信中提取少量数据集。使用互联网通信出现两个问题。首先，此类数据集会错过词汇表中的许多术语，从而无法有效地使用单词嵌入。其次，用户经常犯拼写错误。通常，意图分类模型没有经过拼写错误训练，很难考虑用户会犯错误的方式。取决于单词词汇的模型将始终面临此类问题。理想的分类器应固有地处理拼写错误。借助语义散列，我们克服了这些挑战，并在以下三个数据集上获得了最先进的结果：Chatbot，Ask Ubuntu和Web应用程序[3]。我们的基准可以在线获得。 ¹

著录项

来源
《International Joint Conference on Neural Networks》|2019年|1-6|共6页
会议地点
作者
Kumar Shridhar; Ayushman Dash; Amit Sahu; Gustav Grund Pihlgren; Pedro Alonso; Vinaychandran Pondenkandath; György Kovács; Foteini Simistira; Marcus Liwicki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Natural Language Processing; Intent Classification; Chatbots; Semantic Hashing; Machine Learning; State-of-the-art;

机译：自然语言处理;意图分类;聊天机器人;语义散列;机器学习;最新技术;

相似文献

外文文献
中文文献
专利

1. Hyperspectral Image Classification Method Based on CNN Architecture Embedding With Hashing Semantic Feature [J] . Yu Chunyan, Zhao Meng, Song Meiping, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2019,第6期

机译：基于CNN架构并嵌入哈希特征的高光谱图像分类方法
2. A benchmark dataset and case study for Chinese medical question intent classification [J] . Nan Chen, Xiangdong Su, Tongyang Liu, BMC Medical Informatics and Decision Making . 2020,第3期

机译：基准数据集和中国医疗问题意图分类的案例研究
3. Semantic classification and hash code accelerated detection of design changes in BIM models [J] . Lin Jia-Rui, Zhou Yu-Cheng Automation in construction . 2020,第Jula期

机译：语义分类和哈希码加速了BIM模型的设计变化的检测
4. Subword Semantic Hashing for Intent Classification on Small Datasets [C] . Kumar Shridhar, Ayushman Dash, Amit Sahu, International Joint Conference on Neural Networks . 2019

机译：用于小型数据集的意图分类的子字语法哈希
5. Hashing Based Similarity Search over Massive Datasets [D] . Li, Jinfeng. 2018

机译：基于哈希的大规模数据集相似度搜索
6. A benchmark dataset and case study for Chinese medical question intent classification [O] . Nan Chen, Xiangdong Su, Tongyang Liu, 2020

机译：中医问题意图分类的基准数据集和案例研究
7. Subword Semantic Hashing for Intent Classification on Small Datasets [O] . Kumar Shridhar, Ayushman Dash, Amit Sahu, 2019

机译：用于小型数据集的意图分类的子字语法哈希

Subword Semantic Hashing for Intent Classification on Small Datasets

摘要

著录项

相似文献

相关主题

期刊订阅