首页> 外文会议>Conference on empirical methods in natural language processing >On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
【24h】

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

机译:文本预处理在神经网络架构中的作用:文本分类与情感分析的评估研究

获取原文

摘要

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokeniz-ing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokeniza-tion of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.
机译:文本预处理通常是自然语言处理(NLP)系统管道中的第一步,其最终性能潜在影响。尽管重要的是,文本预处理在深入学习文学中没有受到大量关注。在本文中,我们调查了简单文本预处理决策的影响(特别是令牌,lemmatized,Delulficing和多语分组)对标准神经文本分类器的性能。我们对文本分类和情感分析进行了大量评估标准基准。虽然我们的实验表明,简单的输入文本通常足够了,但它们还突出了预处理技术的显着变化程度。这揭示了关注在管道中的这种通常被忽视的步骤的重要性,特别是在比较不同模型时。最后,我们的评估为培训单词嵌入的最佳预处理实践提供了见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号