首页> 外文会议>Conference on empirical methods in natural language processing >Regularizing Text Categorization with Clusters of Words
【24h】

Regularizing Text Categorization with Clusters of Words

机译:使用单词簇对文本分类进行正则化

获取原文

摘要

Regularization is a critical step in supervised learning to not only address overfitting, but also to take into account any prior knowledge we may have on the features and their dependence. In this paper, we explore state-of-the-art structured regularizers and we propose novel ones based on clusters of words from LSI topics, word2vec embeddings and graph-of-words document representation. We show that our proposed regularizers are faster than the state-of-the-art ones and still improve text classification accuracy. Code and data are available online.
机译:正则化是监督学习中的关键步骤,不仅要解决过度拟合问题,而且要考虑到我们可能对特征及其依赖项所拥有的任何先验知识。在本文中,我们探索了最先进的结构化正则器,并基于LSI主题中的词簇,word2vec嵌入和词图文档表示提出了新颖的正则器。我们证明了我们提出的正则化方法比最新的正则化方法要快,并且仍然可以提高文本分类的准确性。代码和数据可在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号