首页> 外文会议>International Conference on Theory and Practice of Digital Libraries >How Linked Data can Aid Machine Learning-Based Tasks
【24h】

How Linked Data can Aid Machine Learning-Based Tasks

机译:链接数据如何帮助基于机器学习的任务

获取原文

摘要

The discovery of useful data for a given problem is of primary importance since data scientists usually spend a lot of time for discovering, collecting and preparing data before using them for various reasons, e.g., for applying or testing machine learning algorithms. In this paper we propose a general method for discovering, creating and selecting, in an easy way, valuable features describing a set of entities for leveraging them in a machine learning context. We demonstrate the feasibility of this approach by introducing a tool (research prototype), called LODsyndesis_(ML), which is based on Linked Data technologies, that (a) discovers automatically datasets where the entities of interest occur, (b) shows to the user a big number of useful features for these entities, and (c) creates automatically the selected features by sending SPARQL queries. We evaluate this approach by exploiting data from several sources, including British National Library, for creating datasets in order to predict whether a book or a movie is popular or non-popular. Our evaluation contains a 5-fold cross validation and we introduce comparative results for a number of different features and models. The evaluation showed that the additional features did improve the accuracy of prediction.
机译:发现给定问题的有用数据的发现是主要重要性,因为数据科学家通常花费大量时间来发现,收集和准备数据之前,以各种原因在使用它们之前,例如,用于应用或测试机器学习算法。在本文中,我们提出了一种以简单的方式发现,创建和选择的一般方法,这些方法是描述一组实体,用于在机器学习环境中利用它们。我们通过介绍一种叫做Lodsyndesis_(ml)的工具(研究原型)来展示这种方法的可行性,该工具(研究原型)是基于链接数据技术的,(a)发现自动发现感兴趣的实体(b)显示的数据集用户对这些实体的大量有用功能,(c)通过发送SPARQL查询来自动创建所选功能。我们通过从包括英国国家图书馆在内的多个来源的数据来评估这种方法,以创建数据集,以预测书籍或电影是否是流行的或非流行的。我们的评估包含5倍的交叉验证,我们为许多不同的功能和模型引入了比较结果。评估表明,附加功能确实提高了预测的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号