首页> 外文期刊>IEEE transactions on emerging topics in computing >Scratch-DKG: A Framework for Constructing Scratch Domain Knowledge Graph
【24h】

Scratch-DKG: A Framework for Constructing Scratch Domain Knowledge Graph

机译:Scratch-DKG:构建Scratch领域知识图谱的框架

获取原文
获取原文并翻译 | 示例

摘要

With the rapid development of programming platforms, how to utilize the tremendous amount of data produced by the platforms, such as Scatch, has been a big challenge to researchers. The growing data is not only huge, but also heterogeneous and diverse, leading that the existing tools cannot effectively extract valuable information. In this article, considering particular features of Scratch data, we propose an effective framework about constructing a Scratch Domain Knowledge Graph (Scratch-DKG). Our framework includes four modules which are designed to process the semi-structured data, users profile data, projects data and programming knowledge points, respectively. For webpages, we design a template-based wrapper method to extract triples from the semi-structured data. As for users profile data, we improve DeepDive, which is a useful tool to extract information but with the problem of wrong labeling, to extract knowledge triples by the proposed Secondary Labeling Algorithm. For projects data, we propose an advanced keywords extraction method (S-TextRank) to extract keywords triples. For programming knowledge points, we develop a frequently contiguous block combinations mining algorithm to extract the potential domain information of Scratch. Finally, extensive experiments are carried out to evaluate the performance of our proposed methods. The experimental results show that, compared to other competing methods, our proposal can extract more correct and comprehensive Scratch triples.
机译:随着编程平台的快速发展,如何利用Scatch等平台产生的海量数据,一直是研究人员面临的一大挑战。不断增长的数据不仅庞大,而且异构多样,导致现有工具无法有效地提取有价值的信息。在本文中,考虑到Scratch数据的特殊特征,我们提出了一个关于构建Scratch领域知识图谱(Scratch-DKG)的有效框架。我们的框架包括四个模块,分别用于处理半结构化数据、用户配置文件数据、项目数据和编程知识点。对于网页,我们设计了一种基于模板的包装器方法,从半结构化数据中提取三元组。至于用户配置文件数据,我们改进了 DeepDive,这是一个提取信息的有用工具,但存在错误标记的问题,通过提出的二级标记算法提取知识三元组。对于项目数据,我们提出了一种高级关键字提取方法(S-TextRank)来提取关键字三元组。针对编程知识点,我们开发了一种频繁连续的块组合挖掘算法来提取 Scratch 的潜在域信息。最后,进行了大量的实验来评估所提方法的性能。实验结果表明,与其他竞争方法相比,我们的建议可以提取出更正确、更全面的Scratch三元组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号