首页> 外文会议>40th Annual Meeting of the Association for Computational Linguistics, Jul 7-12, 2002, Philadelphia, Pennsylvania, USA >An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation
【24h】

An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation

机译:支持向量机主动学习日语分词的实证研究

获取原文
获取原文并翻译 | 示例

摘要

We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.
机译:我们探讨了使用支持向量机进行主动学习如何在自然语言处理中完成一项重要任务的良好方法。我们使用日语分词作为测试用例。特别是,我们讨论了池的大小如何影响学习曲线。发现在较大池中进行训练的早期阶段,与较小池中的样本相比,需要更多标记示例才能达到给定的准确性。另外,我们提出了一种新颖的技术,通过将它们逐渐添加到池中来有效地使用大量未标记的示例。实验结果表明,与先前研究中的技术相比,我们的技术所需的标记示例更少。为了达到97.0%的准确度,所提出的技术需要使用以前的技术时需要59.3%的标记示例,而只有17.4%的带有随机采样的标记示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号