首页> 外文会议>Advanced data mining and applications >Enhancing Text Categorization Using Sentence Semantics
【24h】

Enhancing Text Categorization Using Sentence Semantics

机译:使用句子语义学增强文本分类

获取原文
获取原文并翻译 | 示例

摘要

Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document.rnA new concept-based model that analyzes terms on the sentence and document levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning.rnA set of experiments using the proposed concept-based model on different datasets in text categorization is conducted. The experiments demonstrate the comparison between traditional weighting and the concept-based weighting enhances the quality of categorization quality of sets of documents substantially.
机译:大多数文本分类技术都是基于文本的单词和/或短语分析。术语频率的统计分析仅捕获了术语在文档中的重要性。但是,两个术语在其文档中的出现频率可以相同,但是一个术语对句子含义的贡献要大于另一个术语。因此,基础模型应指示捕获文本语义的术语。在这种情况下,该模型可以捕获表示句子概念的术语,从而发现文档的主题。rn一种基于概念的新模型,可以在句子和文档级别上分析术语,而不仅仅是传统的文档分析介绍。基于概念的模型可以有效地区分关于句子语义的非重要术语和包含代表句子含义的概念的术语。在文本分类中,使用所提出的基于概念的模型对不同数据集进行了一组实验。实验表明,传统加权和基于概念的加权之间的比较大大提高了文档集的分类质量。

著录项

  • 来源
  • 会议地点 Chengdu(CN);Chengdu(CN)
  • 作者单位

    Pattern Analysis and Machine Intelligence (PAMI) Research Group University of Waterloo Waterloo, Ontario, Canada N2L 3G1;

    Pattern Analysis and Machine Intelligence (PAMI) Research Group University of Waterloo Waterloo, Ontario, Canada N2L 3G1;

    Pattern Analysis and Machine Intelligence (PAMI) Research Group University of Waterloo Waterloo, Ontario, Canada N2L 3G1;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;
  • 关键词

    data mining; text categorization; concept-based model;

    机译:数据挖掘;文本分类基于概念的模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号