首页> 外文会议>ISI 13 >Boosting Text Classification through Stemming of Composite Words

【24h】

Boosting Text Classification through Stemming of Composite Words

机译：通过源于复合词来提高文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text mining is a knowledge intensive process with the main purpose of effectively and efficiently processing large amounts of unstructured data. Due to the rapidly growing amount of raw text available there is a strong need for methods that are capable of dealing with this in terms of automatic classification or indexing. In this context, an essential task is the semantic processing of natural language in order to provide a sound input to the text classification or categorization task. One of the important tasks is stemming which is the process of reducing a certain word to its root (or stem). When a text is pre-processed for mining purposes, stemming is applied in order to bring words from their current variation to their original root in order to better process the natural language with subsequent steps. A challenging task is that of stemming composite words which in many languages form a large part of the daily used vocabulary. In this paper we develop a novel rule-based algorithm for stemming composite words and we show through extensive experiments that the text classification accuracy greatly improves by stemming composite words.

机译：文本挖掘是一个知识密集型过程，主要目的是有效，有效地处理大量非结构化数据。由于可用的迅速增长的原始文本数量有很大的需要，可以在自动分类或索引方面进行能够处理这一点。在这种情况下，基本任务是自然语言的语义处理，以便为文本分类或分类任务提供声音输入。其中一个重要任务是源，这是将某个词减少到其根（或茎）的过程。当文本预处理采矿目的时，应用源，以便将单词从其当前变化带到其原始根目的，以便更好地使用后续步骤处理自然语言。一个具有挑战性的任务是茎干的复合词，许多语言形成日常使用的词汇的大部分。在本文中，我们开发了一种新的基于规则的基于规则的催化词算法，通过广泛的实验表明文本分类精度大大改善了源性复合词。

著录项

来源
《ISI 13》|2014年||共10页
会议地点
作者
Marenglen Biba; Eva Gjati;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 006.3;
关键词
Boosting; Classification; Composite Words;

机译：提升;分类;复合词;
入库时间 2022-08-20 22:33:58

相似文献

外文文献
中文文献
专利

1. Mitigating backdoor attacks in LSTM-based text classification systems by Backdoor Keyword Identification [J] . Chen Chuanshuai, Dai Jiazhu Neurocomputing . 2021,第Sepa10期

机译：通过Backdoor关键字识别缓解基于LSTM的文本分类系统的后门攻击
2. Improving reading comprehension step by step using Online-Boost text readability classification system [J] . La Lei, Wang Nan, Zhou Dong-ping Neural computing & applications . 2015,第4期

机译：使用Online-Boost文本可读性分类系统逐步提高阅读理解力
3. Boosting Text Compression with Word-Based Statistical Encoding [J] . Antonio Farina, Gonzalo Navarro, Jose R. Parama The Computer journal . 2012,第1期

机译：通过基于单词的统计编码提高文本压缩
4. Boosting Text Classification through Stemming of Composite Words [C] . Marenglen Biba, Eva Gjati ISI 13 . 2014

机译：通过源于复合词来提高文本分类
5. Influence of word sense disambiguation on text classification. [D] . Widlak, Magdalena. 2004

机译：词义歧义化对文本分类的影响。
6. The influence of preprocessing on text classification using a bag-of-words representation [O] . Yaakov HaCohen-Kerner, Daniel Miller, Yair Yigal, 2020

机译：使用袋式表示预处理预处理对文本分类的影响
7. Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs [O] . Sima Sharifirad, Borna Jafarpour, Stan Matwin 2018

机译：通过使用知识图形的组合，通过文本增强和文本生成提升文本分类性能。

Boosting Text Classification through Stemming of Composite Words

摘要

著录项

相似文献

相关主题

期刊订阅