首页> 外文会议>Industrial conference on data mining >Mining Semantic Relationships between Concepts across Documents Incorporating Wikipedia Knowledge
【24h】

Mining Semantic Relationships between Concepts across Documents Incorporating Wikipedia Knowledge

机译:结合维基百科知识的文档之间概念之间的语义关系挖掘

获取原文

摘要

The ongoing astounding growth of text data has created an enormous need for fast and efficient text mining algorithms. Traditional approaches for document representation are mostly based on the Bag of Words (BOW) model which takes a document as an unordered collection of words. However, when applied in fine-grained information discovery tasks, such as mining semantic relationships between concepts, sorely relying on the BOW representation may not be sufficient to identify all potential relationships since the resulting associations based on the BOW approach are limited to the concepts that appear in the document collection literally. In this paper, we attempt to complement existing information in the corpus by proposing a new hybrid approach, which mines semantic associations between concepts across multiple text units through incorporating extensive knowledge from Wikipedia. The experimental evaluation demonstrates that search performance has been significantly enhanced in terms of accuracy and coverage compared with a purely BOW-based approach and alternative solutions where only the article contents of Wikipedia or category information are considered.
机译:文本数据的惊人增长使人们对快速有效的文本挖掘算法产生了巨大的需求。传统的文档表示方法主要基于单词袋(BOW)模型,该模型将文档作为单词的无序集合。但是,当应用于细粒度的信息发现任务(例如,挖掘概念之间的语义关系)时,仅依靠BOW表示可能不足以识别所有潜在关系,因为基于BOW方法的结果关联仅限于以下概念:从字面上出现在文档集合中。在本文中,我们尝试通过提出一种新的混合方法来补充语料库中的现有信息,该方法通过整合来自Wikipedia的广泛知识来挖掘多个文本单元之间的概念之间的语义关联。实验评估表明,与纯粹基于BOW的方法和仅考虑Wikipedia的文章内容或类别信息的替代解决方案相比,搜索性能在准确性和覆盖范围方面得到了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号