首页> 外文会议>International Symposium ISKO-Maghreb >Improving textual data classification and discrimination using an ad-hoc metric: Application to a famous text discrimination challenge
【24h】

Improving textual data classification and discrimination using an ad-hoc metric: Application to a famous text discrimination challenge

机译:使用临时度量改进文本数据分类和歧视:应用于著名的文本歧视挑战

获取原文

摘要

Labelling maximization (F-max) is an unbiased metric for estimation of the quality of non-supervised classification (clustering) that promotes the clusters with a maximum value of feature F-measure. In this paper, we show that an adaptation of this metric within the supervised classification allows to perform a selection of features and to calculate for each of them a function of contrast. The method is tested on the famous, difficult deemed and ill-balanced Mitterrand-Chirac talk's dataset of DEFT 2005 challenge. We show that it produces extremely important classification performance improvements on this dataset while allowing to clearly isolate the discriminating characteristics of the different classes (i.e. Chirac and Mitterrand profiles).
机译:标记最大化(F-max)是用于估计非监督分类(聚类)质量的无偏度量,该非监督分类以特征F-measure的最大值促进聚类。在本文中,我们证明了在监督分类中对该指标进行调整可以执行特征选择,并为每个特征计算对比度函数。该方法在著名的,难以理解且失衡的密特朗-希拉克(Mitterrand-Chirac)演讲的DEFT 2005挑战数据集上进行了测试。我们展示了它在此数据集上产生了极其重要的分类性能改进,同时允许清晰地隔离不同类别的区别特征(即Chirac和Mitterrand配置文件)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号