【24h】

DOTS: Drift Oriented Tool System

机译:DOTS:漂移导向工具系统

获取原文

摘要

Drift is a given in most machine learning applications. The idea that models must accommodate for changes, and thus be dynamic, is ubiquitous. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. There are multiple drift patterns types: concepts that appear and disappear suddenly, recurrently, or even gradually or incrementally. Researchers strive to propose and test algorithms and techniques to deal with drift in text classification, but it is difficult to find adequate benchmarks in such dynamic environments. In this paper we present DOTS, Drift Oriented Tool System, a framework that allows for the definition and generation of text-based datasets where drift characteristics can be thoroughly defined, implemented and tested. The usefulness of DOTS is presented using a Twitter stream case study. DOTS is used to define datasets and test the effectiveness of using different document representation in a Twitter scenario. Results show the potential of DOTS in machine learning research.
机译:漂移是大多数机器学习应用程序中必不可少的。模型必须适应变化并因此是动态的,这一思想无处不在。当前的挑战包括时态数据流,漂移和非平稳场景,无论是在社交网络中还是在业务系统中,通常都带有文本数据。漂移模式有多种类型:突然出现,反复出现,甚至逐渐或渐进出现和消失的概念。研究人员努力提出和测试用于处理文本分类中的漂移的算法和技术,但是在这种动态环境中很难找到足够的基准。在本文中,我们介绍了DOTS(面向漂移的工具系统),该框架允许定义和生成基于文本的数据集,在其中可以彻底定义,实施和测试漂移特性。 DOTS的有用性通过Twitter流案例研究进行了介绍。 DOTS用于定义数据集并测试在Twitter场景中使用不同文档表示形式的有效性。结果显示了DOTS在机器学习研究中的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号