首页> 外文期刊>International Journal of Big Data Intelligence >Unstructured data mining: use case for CouchDB
【24h】

Unstructured data mining: use case for CouchDB

机译:非结构化数据挖掘:CouchDB的用例

获取原文
获取原文并翻译 | 示例
       

摘要

'Big data' has changed the status quo on digital content creation, storage and management. While data hoarding over the years has followed the structured-style storage approach, the recent nature of digital content, which is widely unstructured, creates the need to adopt different storage techniques. The NoSQL database systems are therefore proposed to accommodate most of the content being generated today. One of such NoSQL databases that have received significant enterprise adoption is the document-append style storage. The problem however is that, research and tools that can aid data mining tasks from such NoSQL databases is generally lacking. Even though document-append style storages allow data accessibility as web services and over URL/I, building a corresponding data mining tool deviates from the underlying techniques governing web crawlers. Also, existing data mining tools that have been designed for schema-based storages (e.g., RDBMS) are misfits. Hence, our goal in this work is to design a data analytics tool that enables knowledge discovery through information retrieval (i.e., terms) from document-append style storage. Three algorithms for terms extraction are tested which are: the inference-based apriori with a Bayesian component, the hidden Markov model, and the Bernoulli process. Overall, the paper proves the accuracy and speed of each algorithm.
机译:“大数据”改变了数字内容创建,存储和管理的现状。尽管多年来的数据ard积遵循结构化存储方法,但由于数字内容的近来性质非结构化,因此需要采用不同的存储技术。因此,建议使用NoSQL数据库系统来容纳当今正在生成的大多数内容。文档附加样式存储是已在企业中广泛采用的此类NoSQL数据库之一。但是,问题在于,通常缺乏能够帮助从此类NoSQL数据库进行数据挖掘任务的研究和工具。即使文档附加样式存储允许将数据作为Web服务并通过URL / I进行访问,但是构建相应的数据挖掘工具也偏离了管理Web爬网程序的基础技术。而且,已经为基于模式的存储而设计的现有数据挖掘工具(例如,RDBMS)是不合适的。因此,我们在这项工作中的目标是设计一种数据分析工具,该工具能够通过从文档附加样式存储中检索信息(即术语)来发现知识。测试了三种用于术语提取的算法,它们是:具有贝叶斯分量的基于推理的先验,隐藏的马尔可夫模型和伯努利过程。总体而言,本文证明了每种算法的准确性和速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号