首页> 外文会议>World Conference on Information Systems and Technologies >Characterizing User-Generated Text Content Mining: a Systematic Mapping Study of the Portuguese Language
【24h】

Characterizing User-Generated Text Content Mining: a Systematic Mapping Study of the Portuguese Language

机译:表征用户生成的文本内容挖掘:葡萄牙语的系统映射研究

获取原文

摘要

Unstructured data accounts for more than 80% of enterprise data and is growing at an annual exponential rate of 60%. Text mining refers to the process of discovering new, previously unknown and potentially useful information from a variety of unstructured data including user-generated text content (UGTC). Given that Portuguese language is one of the most common languages in the world, and it is also the second most frequent language on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of text mining to UGTC in the Portuguese language. The systematic mapping review method was applied to search, select, and to extract data from the included studies. Our manual and automated searches retrieved 6075 studies up to year 2014, from which 35 were included in the study. Text classification concentrates 79% of all text mining tasks, having the Na?ve Bayes as the main classifier and Twitter as the main data source.
机译:非结构化数据占企业数据的80%以上,每年的指数率为60%以上。文本挖掘是指从包括用户生成的文本内容(UGTC)的各种非结构化数据中发现新的,先前未知和潜在有用信息的过程。鉴于葡萄牙语是世界上最常见的语言之一,它也是Twitter上的第二个最常见的语言,这项工作的目标是绘制当前研究的景观,使文本挖掘在UGTC中的应用葡萄牙语。系统映射审查方法应用于搜索,选择和从附带的研究中提取数据。我们的手册和自动化搜索检索了6075年的2014年研究,其中35项被列入该研究。文本分类集中在所有文本挖掘任务中的79%,使Na ve Bayes作为主要分类器和Twitter作为主要数据源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号