首页> 外文会议> >POS Tagging Experts via Topic Modeling
【24h】

POS Tagging Experts via Topic Modeling

机译:通过主题建模的POS标记专家

获取原文

摘要

Part of speech taggers generally perform well on homogeneous data sets, but their performance often varies considerably across different genres. In this paper we investigate the adaptation of POS taggers to individual genres by creating POS tagging experts. We use topic modeling to determine genres automatically and then build a tagging expert for each genre. We use Latent Dirichlet Allocation to cluster sentences into related topics, based on which we create the training experts for the POS tagger. Likewise, we cluster the test sentences into the same topics and annotate each sentence with the corresponding POS tagging expert. We show that using topic model experts enhances the accuracy of POS tagging by around half a percent point on average over the random baseline, and the 2-topic hard clustering model and the 10-topic soft clustering model improve over the full training set.
机译:语音标记器的一部分通常在同类数据集上表现良好,但是它们的性能通常因不同类型而有很大差异。在本文中,我们通过创建POS标记专家来研究POS标记器对各个类型的适应性。我们使用主题建模来自动确定类型,然后为每种类型构建标签专家。我们使用潜在Dirichlet分配将句子聚类为相关主题,在此基础上,我们为POS标记器创建了培训专家。同样,我们将测试语句聚类到相同的主题中,并使用相应的POS标记专家为每个语句添加注释。我们显示,使用主题模型专家可以在随机基准上平均提高POS标记的准确性约0.5个百分点,并且在整个训练过程中,2-主题硬聚类模型和10-主题软聚类模型得到了改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号