首页> 外文会议> >Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark

【24h】

Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark

机译：在Apache Spark上使用段落向量和认知大数据语义进行情感分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Apache Spark allows us to write a distributed version of any machine learning algorithm, which can be easily scaled up for a larger dataset on a cluster of commodity hardware. In this paper, we propose the hybridization of paragraph vector with distributed, parallel versions of well-known six machine learning techniques for sentiment analysis. We employed a distributed implementation of neural network language model to obtain paragraph vectors for a given corpus. On the paragraph vectors so obtained, we employed a host of distributed classification algorithms available in Apache Spark to perform sentiment classification. We considered two approaches viz. Bag-of-Words based document-term matrix (DTM) and hashing-trick based DTM as two baseline methods for comparison. We experimented with a movie review dataset of size 992 MB. Among the six classifiers employed, MLP turned out to be statistically the same as GBT and SVM, while it statistically significantly outperformed the rest of classifiers by yielding an area under of ROC curve (AUC) of 95.44%.

机译：Apache Spark允许我们编写任何机器学习算法的分布式版本，可以轻松扩展该规模以针对商品硬件集群上的更大数据集。在本文中，我们提出将段落向量与著名的六种机器学习技术的分布式并行版本进行混合，以进行情感分析。我们采用神经网络语言模型的分布式实现来获取给定语料库的段落向量。在这样获得的段落向量上，我们采用了Apache Spark中可用的许多分布式分类算法来执行情感分类。我们考虑了两种方法。基于词袋的文档术语矩阵（DTM）和基于哈希技巧的DTM是进行比较的两种基准方法。我们尝试了992 MB大小的电影评论数据集。在使用的六个分类器中，MLP在统计上与GBT和SVM相同，而在统计上显着优于ROC曲线（AUC）的面积为95.44％，优于其他分类器。

著录项

来源
《》|2018年|187-194|共8页
会议地点
作者
Kumar Ravi; Vadlamani Ravi; B. Shivakrishna;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Handheld computers; Training; Sparks; Banking; Feature extraction; Genetic algorithms; Computational modeling;

机译：掌上电脑;培训;火花;银行;特征提取;遗传算法;计算模型;

相似文献

外文文献
中文文献
专利

1. A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark [J] . Samar Al-Saqqa, Ghazi Al-Naymat, Arafat Awajan Procedia Computer Science . 2018,第5期

机译：Apache Spark下的在线评论的大规模情感数据分类
2. Comparing Results of Sentiment Analysis Using Naive Bayes and Support Vector Machine in Distributed Apache Spark Environment [J] . Tomasz Szandala Computer Science & Information Technology . 2018,第14期

机译：朴素贝叶斯和支持向量机在分布式Apache Spark环境中情感分析结果的比较
3. Evaluation of classification algorithms for banking customer’s behavior under Apache Spark Data Processing System [J] . Wael Etaiwi, Mariam Biltawi, Ghazi Naymat Procedia Computer Science . 2017,第期

机译：Apache Spark数据处理系统下银行客户行为分类算法的评估
4. Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark [C] . Kumar Ravi, Vadlamani Ravi, B. Shivakrishna IEEE International Conference on Cognitive Informatics Cognitive Computing . 2018

机译：使用段落传染媒介和认知大数据语义对Apache Spark的情绪分类
5. Streamlining Big Data Processing Pipelines via Unix Memory Tools, Persistent Spark Datasets, and the Apache Ignite Inmemory File System [D] . Blair, Walter 2018

机译：通过Unix内存工具，持久性Spark数据集和Apache Ignite内存文件系统简化大数据处理管道
6. Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project [O] . Roland N. Boubela, Klaudius Kalcher, Wolfgang Huf, 2015

机译：使用Apache Spark和GPU处理的大数据分析方法用于大规模fMRI数据：来自人类Connectome项目的静态fMRI数据的演示
7. Comparing Results of Sentiment Analysis Using Naive Bayes and Support Vector Machine in Distributed Apache Spark Environment [O] . Tomasz Szandala 2018

机译：使用Naive Bayes的情感分析结果和分布式Apache Spark环境中的支持向量机比较

Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅