Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

Kaiquan Xu; Stephen Shaoyi Liao; Raymond Y. K. Lau; J. Leon Zhao

首页> 外文期刊>INFORMS journal on computing >Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

【24h】

Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

机译：在语义注释中使用大幅度分类器的有效主动学习策略：最佳参数发现视角

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classical supervised machine learning techniques have been explored for semantically annotating unstruc-tured textual data such as consumers' comments archived at social media websites to extract business intelligence. However, these techniques often require a large number of manually labeled training examples to produce accurate annotations. Several active learning approaches that are designed based on probabilistic sequence models have been explored to minimize the number of labeled training examples for semantic annotation tasks. Recent research has shown that large-margin classifiers are viable alternatives to automated semantic annotation, given their strong generalization capabilities and the ability to process high-dimensional data. However, the existing active learning methods that are designed for probabilistic sequence models cannot be easily adapted and applied to large-margin classifiers. The main contribution of this paper is the development of novel active learning methods for large-margin classifiers to fill the aforementioned research gap. In particular, we propose an innovative perspective of taking active learning as a search of optimal parameters for large-margin classifiers. A rigorous evaluation involving two benchmark tests and an empirical test based on real-world data extracted from Amazon.com reveals that the proposed active learning methods can train effective classifiers with significantly fewer training examples while achieving similar annotation performance, compared to a typical state-of-the-art classifier that only uses several labeled training examples. More specifically, one of our proposed active learning methods can reduce the number of training examples by 19.74% at the 68% level of F_1 when compared to the best baseline method, as evaluated based on the Amazon data set. Our research opens the door to the application of intelligent semantic annotation techniques to support real-world applications such as automatically analyzing consumer comments for customer relationship management.

机译：已经探索了经典的监督机器学习技术，用于语义注释未结构化的文本数据，例如在社交媒体网站上归档的消费者评论以提取商业智能。但是，这些技术通常需要大量手动标记的训练示例才能产生准确的注释。已经探索了几种基于概率序列模型设计的主动学习方法，以最大程度地减少用于语义注释任务的带标签训练示例的数量。最近的研究表明，由于大分类器具有强大的泛化能力和处理高维数据的能力，因此它们是自动语义注释的可行替代方法。但是，为概率序列模型设计的现有主动学习方法不能轻松地应用于大利润分类器。本文的主要贡献是针对大幅度分类器的新型主动学习方法的开发，以填补上述研究空白。特别是，我们提出了一种创新的观点，即以主动学习作为大利润分类器的最佳参数搜索。一项严格的评估包括两个基准测试和一个基于从Amazon.com提取的真实数据的经验测试，结果表明，与典型的状态学习方法相比，所提出的主动学习方法可以用更少的训练示例来训练有效的分类器，同时获得相似的注释性能。最先进的分类器，仅使用几个带标签的训练示例。更具体地说，与基于亚马逊数据集评估的最佳基准方法相比，在F_1的68％水平上，我们提出的一种主动学习方法可以将训练示例的数量减少19.74％。我们的研究为智能语义注释技术的应用打开了大门，以支持现实世界的应用程序，例如自动分析消费者评论以进行客户关系管理。

著录项

来源
《INFORMS journal on computing 》 |2014年第3期| 461-483| 共23页
作者
Kaiquan Xu; Stephen Shaoyi Liao; Raymond Y. K. Lau; J. Leon Zhao;
展开▼
作者单位

Marketing and eBusiness Department, School of Business, Nanjing University, Nanjing 210093, China;

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong;

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong;

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
active learning; machine learning; data mining; optimization; business intelligence;

机译：主动学习;机器学习数据挖掘;优化;商业智能;

相似文献

外文文献
中文文献
专利

1. Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation [J] . Jianping Fan, Yuli Gao, Hangzai Luo IEEE Transactions on Image Processing . 2008 ,第3期

机译：集成概念本体和多任务学习以实现更有效的多级图像注释分类器训练
2. Semantic annotation in earth observation based on active learning [J] . Shiyong Cui, Corneliu Octavian Dumitru, Mihai Datcu International journal of image and data fusion . 2014 ,第2期

机译：基于主动学习的地球观测语义标注
3. Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation [J] . Tang J., Zha Z.-J., Tao D., Image Processing, IEEE Transactions on . 2012 ,第4期

机译：面向语义间隙的主动学习用于多标签图像注释
4. Ontology Learning for Cost-Effective Large-Scale Semantic Annotation of Web Service Interfaces [C] . Shahab Mokarizadeh, Peep Kuengas, Mihhail Matskin Knowledge engineering and management by the masses . 2010

机译：Web服务接口的具有成本效益的大规模语义注释的本体学习
5. Service learning: Discovering effective communication strategies by emphasizing the community's perspective. [D] . Oppe, Elizabeth Ann. 2001

机译：服务学习：通过强调社区的观点来发现有效的沟通策略。
6. A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms [O] . Stephen J Goodswen, Paul J Kennedy, John T Ellis 2013

机译：一种使用机器学习算法对真核病原体计算机疫苗发现管道中的输出进行分类的新策略
7. Semi-Automatic Video Semantic Annotation Based on Active Learning [O] . Yan Song, Xian-sheng Hua, Li-rong Dai, 2008

机译：基于主动学习的半自动视频语义标注

Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

摘要

著录项

相似文献

相关主题

期刊订阅