首页> 外文会议>Asia-Pacific Web Conference(APWeb 2004); 20040414-20040417; Hangzhou; CN >Using Maximum Entropy Model for Chinese Text Categorization
【24h】

Using Maximum Entropy Model for Chinese Text Categorization

机译:使用最大熵模型进行中文文本分类

获取原文
获取原文并翻译 | 示例

摘要

Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.
机译:最大熵模型是一种广泛用于各种自然语言任务的概率估计技术。它提供了一个干净且可容纳的框架,可以结合各种上下文信息来估计某种语言现象的可能性。当在类似条件下训练和测试时,这种用于NLP的许多任务的方法可以达到最先进的水平,或优于其他竞争概率方法。在本文中,我们使用最大熵模型进行文本分类。我们使用不同的文本特征生成方法,不同数量的特征和平滑技术来比较和分析其分类性能。此外,在实验中,我们将其与贝叶斯,KNN和SVM进行了比较,并表明其性能高于贝叶斯,并且与KNN和SVM相当。我们认为这是一种有前景的文本分类技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号