首页> 外文会议>International Conference on Advanced Data Mining and Applications >Boosting over Groups and Its Application to Acronym-Expansion Extraction
【24h】

Boosting over Groups and Its Application to Acronym-Expansion Extraction

机译:促进群体及其在缩略语 - 扩展提取中的应用

获取原文

摘要

In many real-world classification applications, instances are generated from different 'groups'. Take webpage classification as an example, the webpages for training and testing can be naturally grouped by network domains, which often vary a lot from one to another in domain size or webpage template. The differences between 'groups' would result that the distribution of instances from different 'groups' also vary. Thus, it is not so reasonable to equally treat the instances as the independent elements during training and testing as in conventional classification algorithms. This paper addresses the classification problem where all the instances can be naturally grouped. Specifically, we give a formulation to this kind of problem and propose a simple but effective boosting approach, which is called AdaBoost.Group. The problem is demonstrated by the task of recognizing acronyms and their expansions from text, where all the instances are grouped by sentences. The experimental results show that our approach is more appropriate to this kind of problems than conventional classification approaches.
机译:在许多真实世界的分类应用程序中,实例是从不同的“组”生成的。拍摄网页分类作为示例,培训和测试的网页可以自然地由网络域分组,这通常会在域大小或网页模板中从一个到另一个到另一个。 “群体”之间的差异会导致来自不同“群体”的实例分布也有所不同。因此,如在训练和测试期间,在传统分类算法中的测试期间,将该情况与独立元件同样地处理该情况并不是如此。本文解决了所有实例可以自然分组的分类问题。具体而言,我们向这种问题提供了一种制定,提出了一种简单但有效的提升方法,称为Adaboost.group。通过识别首字母缩略词及其从文本的扩展的任务来证明了问题,其中所有实例由句子分组。实验结果表明,我们的方法比传统分类方法更适合这种问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号