An Ecology-based Index for Text Embedding and Classification

机译：基于生态的文本嵌入和分类索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach. The employment of statistically meaningful words not only eases the computational burden and the embedding space dimensionality, but also returns a more interpretable model. Our approach is tested on both synthetic datasets and benchmark datasets against well-known embedding techniques, with remarkable results both in terms of performances and computational complexity.

机译：自然语言处理和文本挖掘应用程序已在计算机科学和机器学习社区中得到越来越多的关注和传播。在这项工作中，提出了一种新的嵌入方案来解决文本分类问题。嵌入方案依赖于使用生态学中最初提出的复合索引对语料库中相关词的统计评估：这可以发现整个文本的相关部分（例如，词），在这些词的顶部按粒状进行嵌入计算方法。使用具有统计意义的单词不仅减轻了计算负担和嵌入空间的维数，而且还返回了一个更具可解释性的模型。我们的方法在合成数据集和基准数据集上均采用众所周知的嵌入技术进行了测试，在性能和计算复杂性方面均取得了显著成果。

著录项

来源
《International Joint Conference on Neural Networks》|2020年|1-8|共8页
会议地点
作者
Alessio Martino; Enrico De Santis; Antonello Rizzi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Vocabulary; Indexes; Text mining; Semantics; Genetic communication; String theory;

机译：词汇;索引;文本挖掘;语义;遗传学交流;弦论;

相似文献

外文文献
中文文献
专利

1. Combining Embeddings of Input Data for Text Classification [J] . Parcheta Zuzanna, Sanchis-Trilles German, Casacuberta Francisco, Neural processing letters . 2021,第5期

机译：结合输入数据的嵌入文本分类
2. Grammar guided embedding based Chinese long text sentiment classification [J] . Zhang Chaoli, Lin Dazhen, Cao Donglin, Concurrency and computation: practice and experience . 2021,第21期

机译：基于语法的嵌入式嵌入式中国长文本情绪分类
3. Machine learning for financial transaction classification across companies using character-level word embeddings of text fields [J] . Jorgensen Rasmus Kaer, Igel Christian International journal of intelligent systems in accounting, finance & management . 2021,第3期

机译：在使用文本字段的字符级字嵌入的公司跨越公司的金融交易分类机器学习
4. Saagie at Semeval-2019 Task 5: From Universal Text Embeddings and Classical Features to Domain-specific Text Classification [C] . Miriam Benballa, Sebastien Collet, Romain Picot-Clemente Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies;International workshop on semantic evaluation . 2019

机译：Saagie在Semeval-2019上的任务5：从通用文本嵌入和经典功能到特定于领域的文本分类
5. A Study of Classification and Embedding Methods for Identifying Humpback Whales [D] . ?Nouafo Wanko, Stephane Junior 2020

机译：识别驼背鲸的分类和嵌入方法研究
6. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection [O] . Taxiarchis Botsis, Michael D Nguyen, Emily Jane Woo, 2011

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类
7. Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya [O] . Awet Fesseha, Shengwu Xiong, Eshete Derb Emiru, 2021

机译：基于卷积神经网络的文本分类和低资源语言的Word嵌入：Tigrinya

An Ecology-based Index for Text Embedding and Classification

摘要

著录项

相似文献

相关主题

期刊订阅