首页> 外文OA文献 >Text mining library for Orange data mining suite
【2h】

Text mining library for Orange data mining suite

机译:用于Orange数据挖掘套件的文本挖掘库

摘要

We have developed a text mining system that can be used as an add-on for Orange, a data mining platform. Orange envelops a set of supervised and unsupervised machine learning methods that benefit a typical text mining platform and therefore offers an excellent foundation for development. We have studied the field of text mining and reviewed several open-source toolkits to define its base components. We have included widgets that enable retrieval of data from remote repositories, such as PubMed and New York Times. The pre-processing was designed to include transformation of documents to vectors, stop word removal, lemmatization and stemming. The results can be visualized via widgets such as the word cloud. Our goal was to develop widgets that can be easily incorporated into the existing Orange workflow, can be upgraded with additional widgets, and perform well in a visual programming environment.
机译:我们已经开发了文本挖掘系统,可以将其用作数据挖掘平台Orange的附加组件。 Orange封装了一组有监督和无监督的机器学习方法,这些方法使典型的文本挖掘平台受益,因此为开发提供了良好的基础。我们已经研究了文本挖掘领域,并回顾了几种开源工具箱以定义其基本组件。我们提供了一些小部件,这些小部件可以从远程存储库(例如PubMed和New York Times)检索数据。预处理被设计为包括将文档转换为矢量,停止单词删除,词形去除和词干提取。可以通过诸如词云之类的小部件将结果可视化。我们的目标是开发可轻松合并到现有Orange工作流程中的小部件,可以使用其他小部件进行升级,并在可视化编程环境中表现良好。

著录项

  • 作者

    Novak David;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号