【24h】

Irony Detection in Non-English Tweets

机译:非英语推文中的讽刺检测

获取原文

摘要

Sentiment analysis is the interpretation and classification of emotions conveyed by text data. While there have been many attempts to classify the sentiment of a given text, there have been few models that can do the same when provided with non-English data exhibiting sarcasm or irony. This paper aims to compare various techniques of sarcasm detection and decide which method works the best for datasets of different sizes and types. The models have been tested on datasets of three different non-English languages - Arabic, French and a Hindi-English code-mix. None of the presented models are language-specific and can be run on data of any language. A comparison between a sub-word model, the usage of Term Frequency–Inverse Document Frequency (TF-IDF) and neural networks, a Long Short-Term Memory (LSTM) model and machine learning techniques such as Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, Naive Bayes (NB), Support Vector Machine (SVM) Linear, SVM radial basis function (RBF), SVM Sigmoid has been performed. The output for each language and model has been evaluated based on their F1-score, accuracy, precision, and recall.
机译:情绪分析是文本数据传达的情绪的解释和分类。虽然有许多尝试对给定文本的情绪进行分类,但在提供具有讽刺或讽刺的非英语数据时,可能会有很少的模型。本文旨在比较讽刺检测的各种技术,并确定哪种方法适用于不同尺寸和类型的数据集。该模型已经在三种不同非英语语言的数据集上进行了测试 - 阿拉伯语,法语和印度教英语代码混合。没有呈现的模型是特定于语言的,可以在任何语言的数据上运行。子字模型之间的比较,术语频率反转文档频率(TF-IDF)和神经网络的使用,长期内存(LSTM)模型和机器学习技术,如最近的邻居,决策树,随机森林,Adaboost,天真贝叶斯(NB),支持向量机(SVM)线性,SVM径向基函数(RBF),SVM SIGMOID已经进行。根据其F1分数,准确性,精度和召回,评估了每种语言和模型的输出。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号