【24h】

Statistical Identification of Key Phrases for Text Classification

机译:文本分类关键短语的统计识别

获取原文
获取原文并翻译 | 示例

摘要

Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.
机译:用于文本分类的算法通常涉及两个阶段,第一阶段旨在识别可能与分类过程相关的文本元素(单词和/或短语)。这个阶段通常涉及对文本的分析,该分析既是语言特定的,也可能是领域特定的,并且在计算上也可能是昂贵的。在本文中,我们研究了许多可替代的关键字生成方法和短语构造策略,它们通过简单的,与语言无关的统计属性来识别关键字和短语。我们目前的结果表明,这些方法可以产生良好的分类准确性,使用基于短语的方法可以获得最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号