首页> 外文会议>European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD 2005); 20051003-07; Porto(PT) >Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification
【24h】

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

机译:在文本分类中利用层次词库的词义消歧

获取原文
获取原文并翻译 | 示例

摘要

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional "bag of words" representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.
机译:包含重要语义信息的分层叙词表(HT)的引入,促使研究人员研究了其在提高文本分类任务性能,扩展传统的“单词袋”表示,将单词之间的句法和语义关系纳入其中方面的潜力。在本文中,我们通过基于文档中单词接近度也暗示HT图中也接近度的直觉,提出了一种词义消歧(WSD)方法来解决此问题。我们认为,我们的WSD算法在各种人为消除歧义的基准数据集中展现出的高精度适合于分类任务。此外,我们基于GVSM内核的一般概念定义了一个语义内核,该语义内核捕获了分层同义词库中包含的语义关系。最后,我们使用各种语料库进行实验,从而使用SVM算法实现分类准确性的系统性提高,尤其是在训练集较小的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号