首页> 外文会议>IEEE International Conference on Research in Computational Intelligence and Communication Networks >Document categorization using semantic relatedness Anaphora resolution: A discussion
【24h】

Document categorization using semantic relatedness Anaphora resolution: A discussion

机译:使用语义相关性和回指解析的文档分类:讨论

获取原文

摘要

Document categorization is the process of assigning pre-defined categories to textual documents. State-of-the art approaches have modelled documents in terms of corpus-length long vectors and viewed the problem only from a syntactic perspective. We develop a general measure to estimate the semantic closeness of documents by utilizing the semantic relatedness of the most discriminative individual words that define the document. Anaphora resolution is used to strengthen the meaning ascribed to each document. Our framework benefits from word semantics and the Wordnet taxonomy thus better capturing the underlying meaning of the text and proves to be a more concise representation than traditional Information Retrieval methods. Having the same representation for documents as well as for a category of documents and associating a measure of semantic closeness paves way for modelling documents into a semantic space where unsupervised approaches can be easily used. We evaluate the performance of our measure by implementing it to categorize news documents into two topics and achieve 81 to 92% accuracy.
机译:文档分类是将预定义类别分配给文本文档的过程。最先进的方法已经按照语料库长度的长向量对文档进行了建模,并且仅从句法的角度看待了该问题。我们利用定义文档的最具区别性的单个单词的语义相关性,开发出一种通用的方法来估计文档的语义紧密度。回指解析用于增强每个文档的含义。我们的框架得益于单词语义和Wordnet分类法,因此可以更好地捕获文本的基本含义,并被证明比传统的信息检索方法更简洁。对于文档以及文档类别具有相同的表示形式,并且将语义紧密度的度量相关联,为将文档建模到语义空间中的方法铺平了道路,在该语义空间中可以轻松使用无监督的方法。我们通过实施将新闻文档归类为两个主题并达到81%到92%的准确性来评估该措施的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号