...
【24h】

Word sense disambiguation by learning decision trees from unlabeled data

机译:通过从未标记的数据中学习决策树来消除单词的歧义

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled examples which are then augmented by a large number of unlabeled examples. Using unlabeled examples is important because obtaining labeled data is expensive and time-consuming while it is easy and inexpensive to collect a large number of unlabeled examples. The idea behind this approach is that the labels of unlabeled examples can be estimated by using committees. Using additional unlabeled examples, therefore, improves the performance of word sense disambiguation and minimizes the cost of manual labeling. Effectiveness of this approach was examined on a raw corpus of one million words. Using unlabeled data, we achieved an accuracy improvement up to 20.2%. [References: 40]
机译:在本文中,我们描述了一种使用未标记数据的机器学习方法来消除单词歧义。我们的方法基于决策树委员会的选择性抽样。委员会成员接受了一小组带标签的示例的培训,然后再加上大量未标记的示例。使用未标记的示例非常重要,因为获取标记的数据既昂贵又费时,而收集大量未标记的示例又容易又便宜。这种方法背后的思想是可以通过使用委员会来估计未标记示例的标签。因此,使用其他未标记的示例可以改善单词歧义消除的性能,并最大程度地减少手动标记的成本。对一百万个单词的原始语料库检查了这种方法的有效性。使用未标记的数据,我们将准确度提高了20.2%。 [参考:40]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号