首页> 外文会议>International Joint Conference on Neural Networks >An Ecology-based Index for Text Embedding and Classification
【24h】

An Ecology-based Index for Text Embedding and Classification

机译:基于生态的文本嵌入和分类索引

获取原文

摘要

Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach. The employment of statistically meaningful words not only eases the computational burden and the embedding space dimensionality, but also returns a more interpretable model. Our approach is tested on both synthetic datasets and benchmark datasets against well-known embedding techniques, with remarkable results both in terms of performances and computational complexity.
机译:自然语言处理和文本挖掘应用程序已在计算机科学和机器学习社区中得到越来越多的关注和传播。在这项工作中,提出了一种新的嵌入方案来解决文本分类问题。嵌入方案依赖于使用生态学中最初提出的复合索引对语料库中相关词的统计评估:这可以发现整个文本的相关部分(例如,词),在这些词的顶部按粒状进行嵌入计算方法。使用具有统计意义的单词不仅减轻了计算负担和嵌入空间的维数,而且还返回了一个更具可解释性的模型。我们的方法在合成数据集和基准数据集上均采用众所周知的嵌入技术进行了测试,在性能和计算复杂性方面均取得了显著成果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号