首页> 外文会议>IEEE International Conference on Research in Computational Intelligence and Communication Networks >Document categorization using semantic relatedness Anaphora resolution: A discussion
【24h】

Document categorization using semantic relatedness Anaphora resolution: A discussion

机译:使用语义相关性和Anaphora解决方案分类:讨论

获取原文
获取外文期刊封面目录资料

摘要

Document categorization is the process of assigning pre-defined categories to textual documents. State-of-the art approaches have modelled documents in terms of corpus-length long vectors and viewed the problem only from a syntactic perspective. We develop a general measure to estimate the semantic closeness of documents by utilizing the semantic relatedness of the most discriminative individual words that define the document. Anaphora resolution is used to strengthen the meaning ascribed to each document. Our framework benefits from word semantics and the Wordnet taxonomy thus better capturing the underlying meaning of the text and proves to be a more concise representation than traditional Information Retrieval methods. Having the same representation for documents as well as for a category of documents and associating a measure of semantic closeness paves way for modelling documents into a semantic space where unsupervised approaches can be easily used. We evaluate the performance of our measure by implementing it to categorize news documents into two topics and achieve 81 to 92% accuracy.
机译:文档分类是将预定义类分配给文本文档的过程。最先进的方法在语料库长度的长向量方面具有建模文档,并仅从句法角度来看问题。我们通过利用定义文档的最辨别性单词的语义相关性来估计文档的语义亲密性,我们开发了一般措施。 Anaphora解决方案用于加强归因于每个文件的含义。我们的框架从单词语义和Wordnet分类中获益,从而更好地捕获文本的基础含义,并被证明是比传统信息检索方法更简洁的代表性。具有相同的文件表示以及一类文件,并将衡量语义贴心铺平的方式与将文档建模到可以容易使用无监督方法的语义空间。我们通过实施其将新闻文件分为两个主题并实现81至92%的准确性来评估我们的措施的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号