Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Pawel Cichosz

首页> 外文期刊>Natural language engineering >Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

【24h】

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

机译：讨论论坛的无监督模拟异常检测，使用全球向量进行文本表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. Two English-language and one Polish-language Internet discussion forums devoted to psychoactive substances received from homegrown plants, such as hashish or marijuana, serve as text sources that are both realistic and possibly interesting on their own, due to potential associations with drug-related crime. The utility of two different vector text representations is examined: the simple bag of words representation and a more refined Global Vectors (GloVe) representation, which is an example of the increasingly popular word embedding approach. They are both combined with two unsupervised anomaly detection methods, based on one-class support vector machines (SVM) and based on dissimilarity to k-medoids clusters. The GloVe representation is found definitely more useful for anomaly detection, permitting better detection quality and ameliorating the curse of dimensionality issues with text clustering. The cluster dissimilarity approach combined with this representation outperforms one-class SVM with respect to detection quality and appears a more promising approach to anomaly detection in text data.

机译：异常检测可以被视为无监督的学习任务，其中在历史数据上创建的预测模型用于检测新数据中的广泛实例。这项工作可能有希望的是，对文本数据进行异常检测相对不常见。两种英语和一个波兰语互联网讨论论坛，致力于从本土植物（如Hashish或Marijuana）获得的精神活性物质，作为潜在的与毒品相关的潜在关联的态度和可能有趣的文本来源犯罪。检查了两个不同的矢量文本表示的效用：简单的单词表示和更精致的全球向量（手套）表示，这是越来越流行的嵌入方法的一个例子。它们都与基于单级支持向量机（SVM）的两种无监督的异常检测方法结合在一起，并基于对K-yemoids集群的异化性。对于异常检测，发现手套表示肯定更有用，允许更好的检测质量和改善文本聚类的维度问题的诅咒。聚类异化方法与该表示相结合优于检测质量的单级SVM，并且在文本数据中出现了更有希望的异常检测方法。

著录项

来源
《Natural language engineering》 |2020年第5期|551-578|共28页
作者
Pawel Cichosz;
展开▼
作者单位

Institute of Computer Science Warsaw University of Technology Warszawa Poland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Text classification; Text clustering; Anomaly detection; Word embeddings;

机译：文本分类;文本聚类;异常检测;Word Embeddings.;

相似文献

外文文献
中文文献
专利

1. A CASE STUDY IN TEXT MINING OF DISCUSSION FORUM POSTS: CLASSIFICATION WITH BAG OF WORDS AND GLOBAL VECTORS [J] . Cichosz Pawel International Journal of Applied Mathematics and Computer Science . 2018,第4期

机译：讨论论坛帖子文本挖掘的案例研究：单词袋和整体矢量的分类
2. A Case Study in Text Mining of Discussion Forum Posts: Classification with Bag of Words and Global Vectors [J] . Pawe? Cichosz International journal of applied mathematics and computer science . 2018,第4期

机译：讨论论坛帖子文本挖掘的案例研究：用单词和全球向量分类
3. Use of global context for handling noisy names in discussion texts of a homeopathy discussion forum [J] . Majumder Mukta, Saha Sujan Kumar Knowledge Management & E-Learning: An International Journal . 2014,第1期

机译：使用全局上下文处理顺势疗法讨论论坛讨论文本中的嘈杂名称
4. Anomaly detection in discussion forum posts using Global Vectors [C] . Pawel Cichosz Conference on photonics applications in astronomy, communications, industry, and high-energy physics experiments . 2018

机译：使用Global Vectors在论坛帖子中进行异常检测
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. FuseAD: Unsupervised Anomaly Detection in Streaming Sensors Data by Fusing Statistical and Deep Learning Models [O] . Mohsin Munir, Shoaib Ahmed Siddiqui, Muhammad Ali Chattha, 2019

机译：FuseAD：通过融合统计和深度学习模型在流传感器数据中进行无监督异常检测
7. Anomaly Detection Using Integration Model of Vector Space and Network Representation [O] . Mizuki Oka, Kazuhiko Kato 2007

机译：矢量空间与网络表示集成模型的异常检测

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

摘要

著录项

相似文献

相关主题

期刊订阅