首页> 外文会议>Annual German Conference on Artificial Intelligence(KI 2004); 20040920-24; Ulm(DE) >Integration of Manual and Automatic Text Categorization. A Categorization Workbench for Text-Based Email and Spam
【24h】

Integration of Manual and Automatic Text Categorization. A Categorization Workbench for Text-Based Email and Spam

机译:集成了手动和自动文本分类。基于文本的电子邮件和垃圾邮件的分类工作台

获取原文
获取原文并翻译 | 示例

摘要

As a method structuring information and knowledge contained in texts, text categorization can be to a great extend automated. The automatic text classification systems implement machine learning algorithms and need training samples. In commercial applications however, the automatic categorization appear to come up against limiting factors. For example, it turns out to be difficult to reduce the sample complexity without the categorization quality in terms of recall and precision will suffer. Instead of trying to fully replace the human work by machine, it could be more effective and ultimately efficient to let human and machine cooperate. So we have developed a categorization workbench to realise synergy between manual and machine categorization. To compare the categorization workbench with common automatic classification systems, the automatic categorizer of the IBM DB2 Information Integrator for Content has been chosen for tests. The test results show that, benefiting from the incorporation of user's domain knowledge, the categorization workbench can improve the recall by a factor of two till four with the same number of training samples as the automatic categorizer uses. Further, to get a comparable categorization quality, the categorization workbench just needs an eighth till a quarter of the training samples as the automatic categorizer does.
机译:作为一种构造文本中包含的信息和知识的方法,文本分类可以在很大程度上实现自动化。自动文本分类系统实现机器学习算法,并且需要训练样本。但是,在商业应用中,自动分类似乎遇到了限制因素。例如,事实证明,如果不影响查全率和精度,就很难降低样本复杂度。与其尝试用机器完全代替人工,不如让人工与机器合作更有效,甚至最终有效。因此,我们开发了一个分类工作台,以实现手动分类和机器分类之间的协同作用。为了将分类工作台与常见的自动分类系统进行比较,已选择IBM DB2 Information Integrator for Content的自动分类器进行测试。测试结果表明,受益于用户领域知识的整合,分类工作台可以将召回率提高2到4倍,而训练样本的数量与自动分类器使用的数量相同。此外,为了获得可比的分类质量,分类工作台只需要训练样本的八分之一到四分之一,就像自动分类器一样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号