首页> 外文会议>1st EMNLP workshop blackboxNLP: analyzing and interpreting neural networks for NLP 2018 >On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
【24h】

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

机译:文本预处理在神经网络体系结构中的作用:文本分类和情感分析的评估研究

获取原文
获取原文并翻译 | 示例

摘要

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokeniz-ing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokeniza-tion of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.
机译:文本预处理通常是自然语言处理(NLP)系统的第一步,可能会对最终性能产生影响。尽管很重要,但文本预处理在深度学习文献中并未受到太多关注。在本文中,我们研究了简单文本预处理决策(特别是标记化,词形化,小写和多词分组)对标准神经文本分类器性能的影响。我们根据文本分类和情感分析对标准基准进行了广泛的评估。尽管我们的实验表明,简单地对输入文本进行标记化就足够了,但它们也突显了跨预处理技术的显着程度的可变性。这揭示了注意管道中这一通常被忽略的步骤的重要性,特别是在比较不同模型时。最后,我们的评估提供了有关培训单词嵌入的最佳预处理实践的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号