首页> 外文OA文献 >Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
【2h】

Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth

机译:客户的意见挖掘各种文本审查与诱导知识增长

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.
机译:通常邀请各种服务的客户通过互联网门户键入概述审核。这些评论,用自然语言编写,通常是非结构化的,在规模“好”和“坏”中也给出了一个数字评估。评论越多,可以获得更好的反馈来改进服务。然而,在累积大量数据之后,非线性增长的处理复杂性可能超过分析文本内容的计算能力。像C5这样的决策树诱导者可以揭示来自数据的可理解知识,但它们需要整体数据。本文介绍了窗口的应用,这是一种用于生成数据集归档的技术,该技术为诱导器提供足够的信息,以训练分类器,并获得类似于通过从整个数据集训练模型实现的结果的结果。窗口结果,显着降低了学习问题的复杂性,通过酒店 - 服务客户用英语编写的数十万审查来证明。用户获得由重要词语表示的知识。结果显示了与培训子样本大小相关的审查含义相关的分类准确性错误,培训和测试时间,树尺寸和单词。最后,提出了一种合适的训练设定大小估计的方法。

著录项

  • 作者

    Jan Žižka; Arnošt Svoboda;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号