首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Decoding Chinese User Generated Categories for Fine-Grained Knowledge Harvesting
【24h】

Decoding Chinese User Generated Categories for Fine-Grained Knowledge Harvesting

机译:解码中文用户生成的类别以进行精细的知识收集

获取原文
获取原文并翻译 | 示例

摘要

User Generated Categories (UGCs) are short but informative phrases that reflect how people describe and organize entities. UGCs express semantic relations among entities implicitly hence serve as a rich data source for knowledge harvesting. However, most UGC relation extraction methods focus on English and heavily rely on lexical and syntactic patterns. Applying them directly to Chinese UGCs poses significant challenges because Chinese is an analytic language with flexible language expressions. In this paper, we aim at harvesting fine-grained relations from Chinese UGCs automatically. Based on neural networks and negative sampling, we introduce two word embedding projection models to identify is-a relations. The accuracy of prediction results is improved via a collective refinement algorithm and a hypernym expansion method. We further propose a graph clique mining algorithm to harvest non-taxonomic relations from UGCs, together with their textual patterns. Two experiments are conducted to validate our approach based on Chinese Wikipedia. The first experiment verifies the is-a relation extraction approach achieves high accuracy, outperforming state-of-the-art methods. The second experiment shows that the proposed method can harvest non-taxonomic relations of large quantity and high accuracy, with minimal human intervention.
机译:用户生成类别(UGC)是简短但内容丰富的短语,反映了人们如何描述和组织实体。 UGC隐式表示实体之间的语义关系,因此可作为知识收集的丰富数据源。但是,大多数UGC关系提取方法都集中在英语上,并且严重依赖于词汇和句法模式。将它们直接应用于中文UGC提出了巨大的挑战,因为中文是一种具有灵活语言表达的分析语言。在本文中,我们旨在自动从中国的教资会中收集细粒度的关系。基于神经网络和负采样,我们引入了两个词嵌入投影模型来识别is-a关系。预测结果的准确性通过集体优化算法和上标扩展方法得以提高。我们还提出了一种图团挖掘算法,以从UGC及其文本模式中收集非分类关系。进行了两个实验以验证我们基于中文维基百科的方法。第一个实验验证了is-a关系提取方法可实现较高的精度,性能优于最新方法。第二个实验表明,该方法可以在不需要人工干预的情况下,以大批量,高精度的方式收集非分类关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号