首页> 外文会议>ISI 13 >Boosting Text Classification through Stemming of Composite Words
【24h】

Boosting Text Classification through Stemming of Composite Words

机译:通过源于复合词来提高文本分类

获取原文

摘要

Text mining is a knowledge intensive process with the main purpose of effectively and efficiently processing large amounts of unstructured data. Due to the rapidly growing amount of raw text available there is a strong need for methods that are capable of dealing with this in terms of automatic classification or indexing. In this context, an essential task is the semantic processing of natural language in order to provide a sound input to the text classification or categorization task. One of the important tasks is stemming which is the process of reducing a certain word to its root (or stem). When a text is pre-processed for mining purposes, stemming is applied in order to bring words from their current variation to their original root in order to better process the natural language with subsequent steps. A challenging task is that of stemming composite words which in many languages form a large part of the daily used vocabulary. In this paper we develop a novel rule-based algorithm for stemming composite words and we show through extensive experiments that the text classification accuracy greatly improves by stemming composite words.
机译:文本挖掘是一个知识密集型过程,主要目的是有效,有效地处理大量非结构化数据。由于可用的迅速增长的原始文本数量有很大的需要,可以在自动分类或索引方面进行能够处理这一点。在这种情况下,基本任务是自然语言的语义处理,以便为文本分类或分类任务提供声音输入。其中一个重要任务是源,这是将某个词减少到其根(或茎)的过程。当文本预处理采矿目的时,应用源,以便将单词从其当前变化带到其原始根目的,以便更好地使用后续步骤处理自然语言。一个具有挑战性的任务是茎干的复合词,许多语言形成日常使用的词汇的大部分。在本文中,我们开发了一种新的基于规则的基于规则的催化词算法,通过广泛的实验表明文本分类精度大大改善了源性复合词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号