...
首页> 外文期刊>Advances in Computer Science and Information Technology: ACSIT >The Reliable Knowledge Discovery in Textual Database using R Infrastructure
【24h】

The Reliable Knowledge Discovery in Textual Database using R Infrastructure

机译:使用R基础架构的文本数据库中可靠的知识发现

获取原文
   

获取外文期刊封面封底 >>

       

摘要

ZJn today’s world, the internet and computer technology enormously increased the amount of stored information and unprecedented expansion in the amount of unstructured data in the textual formats, we cannot use the data for any processing to extract useful information, due to the rapid growth of digital data, and Information explosion and availability has changed the nature of information centers. Hence, knowledge discovery and text data mining have attracted an empirical attention with an imminent need for turning such data into useful information, patterns and knowledge. Text mining has become an interesting area in business intelligence application, healthcare, media and research. Text Mining can be defined as a technique which is a process used to analyze text to extract interesting and meaningful information from new or previously unknown information, non-trivial patterns or knowledge of the unstructured text documents or from different resources for particular purposes. The text mining is an interdisciplinary research held utilizing techniques from computer science, computational linguistics, information retrieval, data mining and statistics. Existing toolkits for text mining have low extensibility, lack of availability of application programming interfaces and provide less support for interacting with computing environments. Hence, in this paper, we propose a text mining in R infrastructure or computing environment, it provides intelligent methods for Meta data management and operations on documents, such as preprocessing, data cloud formation, frequency graphs, text clustering and text classification. This paper presents how text mining techniques can be applied in R infrastructure and better utilizing infrastructure features than other text mining products such as dtSearch, SPSS, SAS Text Miner, RapidMiner, weka, etc.
机译:ZJN今天的世界,互联网和计算机技术在文本格式中大大增加了存储信息的数量和非结构化数据量的展望,由于数字的快速增长,我们不能使用任何处理来提取有用信息的数据数据,信息爆炸和可用性改变了信息中心的性质。因此,知识发现和文本数据挖掘引起了实证的关注,即将需要将这些数据转化为有用的信息,模式和知识。文本挖掘已成为商业智能应用,医疗保健,媒体和研究中的一个有趣区域。文本挖掘可以被定义为一种技术,该技术是用于分析文本以从新的或先前未知的信息,非琐事模式或非结构化文本文档的知识或非结构化文本文档的知识或特定资源的不同资源中提取有趣和有意义的信息。文本挖掘是利用计算机科学,计算语言学,信息检索,数据挖掘和统计数据的技术持有跨学科研究。用于文本挖掘的现有工具包具有低的可扩展性,缺乏应用程序编程接口的可用性,并提供对与计算环境进行交互的较少支持。因此,在本文中,我们提出了R基础设施或计算环境中的文本挖掘,它为元数据管理和文档的操作提供了智能方法,例如预处理,数据云形成,频率图,文本群集和文本分类。本文介绍了文本挖掘技术如何应用​​于R基础设施,而且更好地利用基础设施特征,而不是其他文本挖掘产品,如DTSearch,SPSS,SAS文本矿工,Rapidminer,Weka等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号