首页> 外文会议>International Conference on Big Data, Small Data, Linked Data and Open Data >Creating Data-Driven Ontologies An Agriculture Use Case
【24h】

Creating Data-Driven Ontologies An Agriculture Use Case

机译:创建数据驱动的本体案例

获取原文

摘要

The manual creation of an ontology is a tedious task. In the field of ontology learning, Natural Language Processing (NLP) techniques are used to automatically create ontologies. In this paper, we present a methodology using data-driven techniques to create ontologies from unstructured documents in the agriculture domain. We use state-of-the-art NLP techniques based on Stanford OpenIE, Hearst patterns and co-occurrences to create ontologies. We add an NLP-method that uses dependency parsing and transformation rules based on linguistic patterns. In addition, we use keyword-driven techniques from the query expansion field, based on Word2vec, WordNet and ConceptNet, to create ontologies. We add a method that takes the union of the ontologies produced by the keyword-based methods. The semantic quality of the different ontologies is calculated using automatically extracted keywords. We define recall, precision and F1-score based on the concepts and relations in which the keywords are present. The results show that 1) the method based on co-occurrences has the best F1-score with more than 100 keywords; 2) the keyword-based methods have a higher F1-score than the NLP-based methods with less than 100 keywords in the evaluation and; 3) the combined keyword-based method always has a higher F1-score compared to each single method. In our future work, we will focus on improving the dependency parsing algorithm, improving combining different ontologies, and improving our quality evaluation methodology.
机译:手册创建了本体是一项繁琐的任务。在本体学习领域,使用自然语言处理(NLP)技术用于自动创建本体。在本文中,我们介绍了一种使用数据驱动技术的方法来创建来自农业领域的非结构化文件的本体。我们使用基于斯坦福Openie,赫斯特模式和共同发生的最先进的NLP技术来创建本体。我们添加了一个NLP方法,它使用基于语言模式的依赖性解析和转换规则。此外,我们使用查询扩展字段的关键字驱动技术,基于Word2VEC,WordNet和ConceptNet来创建本体。我们添加了一种方法,它采用由基于关键字的方法产生的本体的联合。使用自动提取的关键字计算不同的本体的语义质量。我们根据存在关键字的概念和关系来定义召回,精度和F1分数。结果表明,1)基于共同发生的方法具有超过100个关键词的最佳F1分数; 2)基于关键字的方法具有比评估中少于100个关键字的基于NLP的基于NLP的方法的F1分数。 3)与每个单个方法相比,基于关键字的方法始终具有更高的F1分数。在我们未来的工作中,我们将专注于改善依赖解析算法,改善不同本体的组合,提高质量评估方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号