Topic categorization of Tamil News Articles using PreTrained Word2Vec Embeddings with Convolutional Neural Network

机译：使用预训练的Word2Vec嵌入和卷积神经网络对泰米尔语新闻报道进行主题分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Almost all the problems in NLP are solved using various techniques from machine learning to Deep Learning. Still, there is mystery in language localization. NLP problems are unclear for languages other than English. The problems may be named as Entity Extraction, OCR or classification and prediction in sequence modelling. The amount of people using local language (Tamil, Telegu, Hindi etc) in the social media is increasing, so it is important to automate the process of classifying those contents. Here, the aim is to classify the Tamil news articles to its related topics (Sports, Cinema, Politics). In the existing work they have approached traditional machine learning methods with TFIDF of words as features. In this work we have compared the existing TFIDF feature learning along with Pre-Trained embeddings given to Convolutional Neural Networks (CNN). We found that CNN with pretrained embeddings gave better F1 score compare to TFIDF feature learned with Support Vector Machine (SVM), Naive Bayes (NB) algorithm.

机译：使用从机器学习到深度学习的各种技术，几乎可以解决NLP中的所有问题。尽管如此，语言本地化还是一个谜。对于除英语以外的其他语言，NLP问题尚不清楚。这些问题可能被称为实体提取，OCR或序列建模中的分类和预测。在社交媒体中使用本地语言（泰米尔语，泰勒古语，印地语等）的人数正在增加，因此自动化对这些内容进行分类的过程非常重要。在这里，目的是将泰米尔语新闻文章归类为其相关主题（体育，电影，政治）。在现有工作中，他们采用了以单词TFIDF为特征的传统机器学习方法。在这项工作中，我们将现有的TFIDF特征学习与卷积神经网络（CNN）的预训练嵌入进行了比较。我们发现，与使用支持向量机（SVM），朴素贝叶斯（NB）算法学习的TFIDF特征相比，具有预训练嵌入的CNN给出了更好的F1分数。

著录项

来源
《International Conference on Computational Intelligence for Smart Power System and Sustainable Energy》|2020年|1-4|共4页
会议地点
作者
Ramraj S; Arthi R; Solai Murugan; M.S. Julie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Tamil Text Classification; CNN; Pretrained Tamil embeddings;

机译：泰米尔语文本分类; CNN;预训练的泰米尔语嵌入;

相似文献

外文文献
中文文献
专利

1. What is This Article About? Extreme Summarization with Topic-Aware Convolutional Neural Networks [J] . Narayan Shashi, Cohen Shay B., Lapata Mirella The Journal of Artificial Intelligence Research . 2019,第期

机译：这篇文章是关于什么的？与主题感知卷积神经网络极端摘要
2. What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks [J] . Shashi Narayan, Shay B. Cohen, Mirella Lapata The Journal of Artificial Intelligence Research . 2019,第7期

机译：这篇文章是关于什么的？与主题感知卷积神经网络极端摘要
3. Protein-Protein Interaction Article Classification Using a Convolutional Recurrent Neural Network with Pre-trained Word Embeddings [J] . Sérgio Matos, Rui Antunes Journal of Integrative Bioinformatics . 2017,第4期

机译：蛋白质 - 蛋白质相互作用物品使用卷积复制神经网络进行分类，具有预先接受训练的单词嵌入
4. Uzbek News Categorization using Word Embeddings and Convolutional Neural Networks [C] . Ilyos Rabbimov, Sami Kobilov, Iosif Mporas IEEE International Conference on Application of Information and Communication Technologies . 2020

机译：Uzbek新闻分类使用Word Embeddings和卷积神经网络
5. One-Shot Learning with Pretrained Convolutional Neural Network [D] . ?Yu, Zhixian 2019

机译：用预折叠卷积神经网络进行一次性学习
6. Word2vec convolutional neural networks for classification of news articles and tweets [O] . Beakcheol Jang, Inhwan Kim, Jong Wook Kim 2012

机译：Word2vec卷积神经网络用于新闻文章和推文分类
7. Word2vec convolutional neural networks for classification of news articles and tweets [O] . Beakcheol Jang, Inhwan Kim, Jong Wook Kim 2019

机译：Word2VEC卷积神经网络，用于分类新闻文章和推文

Topic categorization of Tamil News Articles using PreTrained Word2Vec Embeddings with Convolutional Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅