...
首页> 外文期刊>Journal of the American statistical association >Classification With Unstructured Predictors and an Application to Sentiment Analysis
【24h】

Classification With Unstructured Predictors and an Application to Sentiment Analysis

机译:非结构化预测变量的分类及其在情感分析中的应用

获取原文
获取原文并翻译 | 示例

摘要

Unstructured data refer to information that lacks certain structures and cannot be organized in a predefined fashion. Unstructured data often involve words, texts, graphs, objects, or multimedia types of files that are difficult to process and analyze with traditional computational tools and statistical methods. This work explores ordinal classification for unstructured predictors with ordered class categories, where imprecise information concerning strengths of association between predictors is available for predicting class labels. However, imprecise information here is expressed in terms of a directed graph, with each node representing a predictor and a directed edge containing pairwise strengths of association between two nodes. One of the targeted applications for unstructured data arises from sentiment analysis, which identifies and extracts the relevant content or opinion of a document concerning a specific event of interest. We integrate the imprecise predictor relations into linear relational constraints over classification function coefficients, where large margin ordinal classifiers are introduced, subject to many quadratically linear constraints. The proposed classifiers are then applied in sentiment analysis using binary word predictors. Computationally, we implement ordinal support vector machines and psi-learning through a scalable quadratic programming package based on sparse word representations. Theoretically, we show that using relationships among unstructured predictors improves prediction accuracy of classification significantly. We illustrate an application for sentiment analysis using consumer text reviews and movie review data. Supplementary materials for this article are available online.
机译:非结构化数据是指缺少某些结构并且无法以预定义方式进行组织的信息。非结构化数据通常包含单词,文本,图形,对象或多媒体类型的文件,这些文件难以使用传统的计算工具和统计方法进行处理和分析。这项工作探索了按顺序分类类别的非结构化预测变量的序数分类,其中关于预测变量之间关联强度的不精确信息可用于预测分类标签。但是,此处的不精确信息是根据有向图表示的,每个节点代表一个预测变量,一个有向边包含两个节点之间的成对关联强度。非结构化数据的目标应用程序之一来自情感分析,它可以识别并提取与特定感兴趣事件有关的文档的相关内容或意见。我们将不精确的预测变量关系整合到分类函数系数上的线性关系约束中,其中引入了大余量序数分类器,但受到许多二次线性约束的约束。然后使用二进制单词预测器将建议的分类器应用于情感分析。在计算上,我们通过基于稀疏词表示的可扩展二次编程包来实现序数支持向量机和psi学习。从理论上讲,我们表明使用非结构化预测变量之间的关系可以显着提高分类的预测准确性。我们举例说明了使用消费者文本评论和电影评论数据进行情感分析的应用程序。可在线获得本文的补充材料。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号