Regularizing Text Categorization with Clusters of Words

机译：使用单词簇对文本分类进行正则化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Regularization is a critical step in supervised learning to not only address overfitting, but also to take into account any prior knowledge we may have on the features and their dependence. In this paper, we explore state-of-the-art structured regularizers and we propose novel ones based on clusters of words from LSI topics, word2vec embeddings and graph-of-words document representation. We show that our proposed regularizers are faster than the state-of-the-art ones and still improve text classification accuracy. Code and data are available online.

机译：正则化是监督学习中的关键步骤，不仅要解决过度拟合问题，而且要考虑到我们可能对特征及其依赖项所拥有的任何先验知识。在本文中，我们探索了最先进的结构化正则器，并基于LSI主题中的词簇，word2vec嵌入和词图文档表示提出了新颖的正则器。我们证明了我们提出的正则化方法比最新的正则化方法要快，并且仍然可以提高文本分类的准确性。代码和数据可在线获得。

著录项

来源
《Conference on empirical methods in natural language processing》|2016年|1827-1837|共11页
会议地点
作者
Konstantinos Skianis; Francois Rousseau; Michalis Vazirgiannis;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Distributional Word Clusters vs. Words for Text Categorization [J] . Bekkerman Ron, El-Yaniv Ran, Tishby Naftali, Journal of machine learning research . 2003,第Mar期

机译：分布式词簇与用于文本分类的词
2. Clustering of Words Based on Relative Contribution for Text Categorization [J] . Jie-Ming Yang, Zhi-Ying Liu, Zhao-Yang Qu IAENG Internaitonal journal of computer science . 2013,第3期

机译：基于相对贡献的词聚类在文本分类中的应用
3. Chinese Text Categorization via Bottom-Up Weighted Word Clustering [J] . Yu-Chieh Wu International Journal of Enterprise Information Systems . 2015,第1期

机译：通过自下而上的加权词聚类对中文文本进行分类
4. Regularizing Text Categorization with Clusters of Words [C] . Konstantinos Skianis, Francois Rousseau, Michalis Vazirgiannis Conference on empirical methods in natural language processing . 2016

机译：使用单词群集进行正规文本分类
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. Long-Range Memory in Literary Texts: On the Universal Clustering of the Rare Words [O] . Kumiko Tanaka-Ishii, Armin Bunde -1

机译：文学文本中的长期记忆：稀有词的普遍聚类
7. Regularizing Text Categorization with Clusters of Words [O] . Konstantinos Skianis, Francois Rousseau, Michalis Vazirgiannis 2016

机译：使用单词群集进行正规文本分类

Regularizing Text Categorization with Clusters of Words

摘要

著录项

相似文献

相关主题

期刊订阅