Classification With Unstructured Predictors and an Application to Sentiment Analysis

Wang Junhui; Shen Xiaotong; Sun Yiwen; Qu Annie

首页> 外文期刊>Journal of the American statistical association >Classification With Unstructured Predictors and an Application to Sentiment Analysis

【24h】

Classification With Unstructured Predictors and an Application to Sentiment Analysis

机译：非结构化预测变量的分类及其在情感分析中的应用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Unstructured data refer to information that lacks certain structures and cannot be organized in a predefined fashion. Unstructured data often involve words, texts, graphs, objects, or multimedia types of files that are difficult to process and analyze with traditional computational tools and statistical methods. This work explores ordinal classification for unstructured predictors with ordered class categories, where imprecise information concerning strengths of association between predictors is available for predicting class labels. However, imprecise information here is expressed in terms of a directed graph, with each node representing a predictor and a directed edge containing pairwise strengths of association between two nodes. One of the targeted applications for unstructured data arises from sentiment analysis, which identifies and extracts the relevant content or opinion of a document concerning a specific event of interest. We integrate the imprecise predictor relations into linear relational constraints over classification function coefficients, where large margin ordinal classifiers are introduced, subject to many quadratically linear constraints. The proposed classifiers are then applied in sentiment analysis using binary word predictors. Computationally, we implement ordinal support vector machines and psi-learning through a scalable quadratic programming package based on sparse word representations. Theoretically, we show that using relationships among unstructured predictors improves prediction accuracy of classification significantly. We illustrate an application for sentiment analysis using consumer text reviews and movie review data. Supplementary materials for this article are available online.

机译：非结构化数据是指缺少某些结构并且无法以预定义方式进行组织的信息。非结构化数据通常包含单词，文本，图形，对象或多媒体类型的文件，这些文件难以使用传统的计算工具和统计方法进行处理和分析。这项工作探索了按顺序分类类别的非结构化预测变量的序数分类，其中关于预测变量之间关联强度的不精确信息可用于预测分类标签。但是，此处的不精确信息是根据有向图表示的，每个节点代表一个预测变量，一个有向边包含两个节点之间的成对关联强度。非结构化数据的目标应用程序之一来自情感分析，它可以识别并提取与特定感兴趣事件有关的文档的相关内容或意见。我们将不精确的预测变量关系整合到分类函数系数上的线性关系约束中，其中引入了大余量序数分类器，但受到许多二次线性约束的约束。然后使用二进制单词预测器将建议的分类器应用于情感分析。在计算上，我们通过基于稀疏词表示的可扩展二次编程包来实现序数支持向量机和psi学习。从理论上讲，我们表明使用非结构化预测变量之间的关系可以显着提高分类的预测准确性。我们举例说明了使用消费者文本评论和电影评论数据进行情感分析的应用程序。可在线获得本文的补充材料。

著录项

来源
《Journal of the American statistical association》 |2016年第515期|1242-1253|共12页
作者
Wang Junhui; Shen Xiaotong; Sun Yiwen; Qu Annie;
展开▼
作者单位

Univ Illinois, Dept Math Stat & Comp Sci, Chicago, IL 60680 USA|City Univ Hong Kong, Dept Math, Hong Kong, Hong Kong, Peoples R China;

Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA;

Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA;

Univ Illinois, Dept Stat, Champaign, IL 61820 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Large margin learners; Large n and p; Natural language processing; Sentiment analysis; Text and opinion mining; Unstructured data;

机译：大量学习者;n和p大;自然语言处理;情感分析;文本和观点挖掘;非结构化数据;

相似文献

外文文献
中文文献
专利

1. Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis [J] . Farrikh Alzami, Erika Devi Udayanti, Dwi Puji Prabowo, Kinetik . 2020,第3期

机译：文档预处理TF-IDF以提高非结构化情感分析的极性分类性能
2. ReviewModus: Text classification and sentiment prediction of unstructured reviews using a hybrid combination of machine learning and evaluation models [J] . Zablith Fouad, Osman Ibrahim H. Applied Mathematical Modelling . 2019,第JULa期

机译：ReviewModus：使用机器学习和评估模型的混合组合对非结构化评论进行文本分类和情感预测
3. ReviewModus: Text classification and sentiment prediction of unstructured reviews using a hybrid combination of machine learning and evaluation models [J] . Zablith Fouad, Osman Ibrahim H. Applied Mathematical Modelling . 2019,第Jula期

机译：点评译文：使用机器学习和评估模型的混合组合，文本分类和情绪预测非结构化审查
4. Sentiment miner: A prototype for sentiment analysis of unstructured data and text [C] . Shahbaz Muhammad, Guergachi Aziz, ur Rehman Rana Tanzeel IEEE Canadian Conference on Electrical and Computer Engineering . 2014

机译：情感挖掘者：用于非结构化数据和文本的情感分析的原型
5. Sentiment Analysis of Twitter Data Using Various Classification Algorithms [D] . Das, Sourangshu 2018

机译：使用各种分类算法对Twitter数据进行情感分析
6. Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors [O] . Satoshi Fukuchi, Keiichi Homma, Yoshiaki Minezaki, 2009

机译：准确的蛋白质分类系统的开发该系统将发现新的结构域分为结构化和非结构化区域：将其应用于人类转录因子
7. Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis [O] . Farrikh Alzami, Erika Devi Udayanti, Dwi Puji Prabowo, 2020

机译：文档预处理TF-IDF以提高非结构化情感分析的极性分类性能

Classification With Unstructured Predictors and an Application to Sentiment Analysis

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅