【24h】

Classification Based on Specific Vocabulary

机译:基于特定词汇的分类

获取原文

摘要

Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms characterizing a document (or a sample of texts). We then show how these Z score values can be used to derive an efficient categorization scheme. To evaluate this proposition we categorize speeches given by B. Obama as either electoral or presidential. The results tend to show that the suggested classification scheme performs better than a Support Vector Machine scheme, and a Naive Bayes classifier (10-fold cross validation).
机译:假设出现单词的二项式分布,我们建议计算标准化的Z分数,以定义子集与整个语料库相比的特定词汇。此方法适用于表征文档(或文本样本)的权重术语。然后,我们说明如何使用这些Z得分值来得出有效的分类方案。为了评估这一主张,我们将B. Obama的演讲归类为选举或总统演讲。结果倾向于表明,建议的分类方案比支持向量机方案和朴素贝叶斯分类器(10倍交叉验证)更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号