首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >An Empirical Investigation of Word Class-Based Features for Natural Language Understanding
【24h】

An Empirical Investigation of Word Class-Based Features for Natural Language Understanding

机译:基于单词类的自然语言理解功能的实证研究

获取原文
获取原文并翻译 | 示例

摘要

There are many studies that show using class-based features improves the performance of natural language processing (NLP) tasks such as syntactic part-of-speech tagging, dependency parsing, sentiment analysis, and slot filling in natural language understanding (NLU), but not much has been reported on the underlying reasons for the performance improvements. In this paper, we investigate the effects of the word class-based features for the exponential family of models specifically focusing on NLU tasks, and demonstrate that the performance improvements could be attributed to the regularization effect of the class-based features on the underlying model. Our hypothesis is based on empirical observation that the sum of parameter magnitudes in an exponential model tends to improve performance. We show on several semantic tagging tasks that there is a positive correlation between the model size reduction by the addition of the class-based features and the model performance on a held-out dataset. We also demonstrate that class-based features extracted from different data sources using alternate word clustering methods can individually contribute to the performance gain. Since the proposed features are generated in an unsupervised manner without significant computational overhead, the improvements in performance largely come for free and we show that such features provide gains for a wide range of tasks from semantic classification and slot tagging in NLU to named entity recognition (NER).
机译:有许多研究表明,使用基于类的功能可以提高自然语言处理(NLP)任务的性能,例如句法词性标记,依赖项解析,情感分析和自然语言理解(NLU)的空位填充。关于性能改进的根本原因的报道还很少。在本文中,我们调查了基于单词类特征对指数类模型的影响,这些模型专门针对NLU任务,并证明了性能的提高归因于基于类特征对基础模型的正则化作用。我们的假设基于经验观察,即指数模型中参数幅值的总和往往会提高性能。我们在几个语义标记任务上表明,通过添加基于类的特征减少模型尺寸与在保留数据集上的模型性能之间存在正相关。我们还演示了使用替代词聚类方法从不同数据源中提取的基于类的功能可以单独提高性能。由于提出的特征是在无监督的情况下生成的,而没有大量的计算开销,因此性能的提高很大程度上是免费的,并且我们证明了这些特征为从语义分类和NLU中的槽位标记到命名实体识别的广泛任务提供了收益( NER)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号