首页> 美国卫生研究院文献>other >Self-Supervised Chinese Ontology Learning from Online Encyclopedias
【2h】

Self-Supervised Chinese Ontology Learning from Online Encyclopedias

机译:在线百科自学汉语本体

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO intwo aspects, scale and precision; manual evaluation results show thatthe ontology has excellent precision, and high coverage is concluded bycomparing SSCO with other famous ontologies and knowledge bases; theexperiment results also indicate that the self-supervised models obviouslyenrich SSCO.
机译:手动构建本体是一项耗时,容易出错且繁琐的任务。我们介绍SSCO,一种基于自我监督的学习的中文本体,它包含约25.5万个概念,500万个实体和4000万个事实。我们探索了用于本体学习的三个最大的在线中文百科全书,并描述了如何将百科全书中的结构化知识(包括文章标题,类别标签,重定向页面,分类系统和InfoBox模块)转换为本体形式。为了避免百科全书中的错误并丰富学习的本体,我们还应用了一些基于机器学习的方法。首先,我们证明了自我监督的机器学习方法在统计和实验上对中文关系提取(至少对于同义词和下位而言)是可行的,并且训练了一些自我监督模型(SVM和CRF)用于同义词提取,概念-子概念关系提取,以及概念实例关系提取;我们方法的优点是,所有训练示例都是从百科全书的结构信息和一些通用启发式规则自动生成的。最后,我们评估SSCO在规模和精度两个方面;人工评估结果表明本体精度高,覆盖范围广将SSCO与其他著名的本体和知识库进行比较;的实验结果还表明,自监督模型明显丰富SSCO。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(2014),-1
  • 年度 -1
  • 页码 848631
  • 总页数 13
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

  • 入库时间 2022-08-21 11:19:25

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号