...
首页> 外文期刊>INFORMS journal on computing >Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective
【24h】

Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

机译:在语义注释中使用大幅度分类器的有效主动学习策略:最佳参数发现视角

获取原文
获取原文并翻译 | 示例

摘要

Classical supervised machine learning techniques have been explored for semantically annotating unstruc-tured textual data such as consumers' comments archived at social media websites to extract business intelligence. However, these techniques often require a large number of manually labeled training examples to produce accurate annotations. Several active learning approaches that are designed based on probabilistic sequence models have been explored to minimize the number of labeled training examples for semantic annotation tasks. Recent research has shown that large-margin classifiers are viable alternatives to automated semantic annotation, given their strong generalization capabilities and the ability to process high-dimensional data. However, the existing active learning methods that are designed for probabilistic sequence models cannot be easily adapted and applied to large-margin classifiers. The main contribution of this paper is the development of novel active learning methods for large-margin classifiers to fill the aforementioned research gap. In particular, we propose an innovative perspective of taking active learning as a search of optimal parameters for large-margin classifiers. A rigorous evaluation involving two benchmark tests and an empirical test based on real-world data extracted from Amazon.com reveals that the proposed active learning methods can train effective classifiers with significantly fewer training examples while achieving similar annotation performance, compared to a typical state-of-the-art classifier that only uses several labeled training examples. More specifically, one of our proposed active learning methods can reduce the number of training examples by 19.74% at the 68% level of F_1 when compared to the best baseline method, as evaluated based on the Amazon data set. Our research opens the door to the application of intelligent semantic annotation techniques to support real-world applications such as automatically analyzing consumer comments for customer relationship management.
机译:已经探索了经典的监督机器学习技术,用于语义注释未结构化的文本数据,例如在社交媒体网站上归档的消费者评论以提取商业智能。但是,这些技术通常需要大量手动标记的训练示例才能产生准确的注释。已经探索了几种基于概率序列模型设计的主动学习方法,以最大程度地减少用于语义注释任务的带标签训练示例的数量。最近的研究表明,由于大分类器具有强大的泛化能力和处理高维数据的能力,因此它们是自动语义注释的可行替代方法。但是,为概率序列模型设计的现有主动学习方法不能轻松地应用于大利润分类器。本文的主要贡献是针对大幅度分类器的新型主动学习方法的开发,以填补上述研究空白。特别是,我们提出了一种创新的观点,即以主动学习作为大利润分类器的最佳参数搜索。一项严格的评估包括两个基准测试和一个基于从Amazon.com提取的真实数据的经验测试,结果表明,与典型的状态学习方法相比,所提出的主动学习方法可以用更少的训练示例来训练有效的分类器,同时获得相似的注释性能。最先进的分类器,仅使用几个带标签的训练示例。更具体地说,与基于亚马逊数据集评估的最佳基准方法相比,在F_1的68%水平上,我们提出的一种主动学习方法可以将训练示例的数量减少19.74%。我们的研究为智能语义注释技术的应用打开了大门,以支持现实世界的应用程序,例如自动分析消费者评论以进行客户关系管理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号