A New Text Categorization (TC) Algorithm for Classifying Arabic Language Text Document

机译：用于分类阿拉伯语文本文档的新文本分类（TC）算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic text categorization (TC) has become one of the most interesting fields for researchers in data mining, information retrieval, web text mining, as well as natural language processing paradigms due to the vast number of new documents being retrieved for various information retrieval systems. This paper proposes a new TC technique, which classifies Arabic language text documents using clustering algorithm. The proposed technique is based on classifying documents using the Local Sparsity Coefficient-mine algorithm (LSC-mine algorithm); this algorithm is an outlier detection algorithm that belongs to clustering paradigm. The adopted algorithm is capable of detecting outlier points in a spatial space; the discovering process is accomplished through computing the Local sparsity ratio (LSC), which indicates the outlier-ness of a certain point. The algorithm is conducted and experemints on Arabic document files gathered from the internet website. The algorithm is implemented using the Visual C++.NET programming language.

机译：自动文本分类（TC）已成为数据挖掘，信息检索，Web文本挖掘的研究人员最有趣的字段之一，以及由于针对各种信息检索系统检索的大量新文档而导致的自然语言处理范例。本文提出了一种新的TC技术，它使用聚类算法对阿拉伯语文本文档进行分类。所提出的技术基于使用局部稀疏系数矿矿物算法的分类文档（LSC-MINE算法）;该算法是一个属于群集范例的异常检测算法。所采用的算法能够检测空间中的异常点;发现过程是通过计算局部稀疏比（LSC）来实现的，这表示某个点的异常。在从Internet网站收集的阿拉伯文档文件上进行了该算法和实验。算法使用Visual C ++实现。NET编程语言。

著录项

来源
《International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems》|2014年||共10页
会议地点
作者
KHALED ALHAWITI; NIDAL F. SHILBAYEH;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-53;
关键词
Text Categorization; Data Mining; LSC-mine; Arabic Language Text Clustering; Outlier Detection Algorithm;

机译：文本分类;数据挖掘;LSC-MINE;阿拉伯语文本聚类;异常值检测算法;

相似文献

外文文献
中文文献
专利

1. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
2. Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms [J] . Bassam Al-Salemi, Masri Ayob, Graham Kendall, Information Processing & Management . 2019,第1期

机译：多标签阿拉伯语文本分类：多标签学习算法的基准和基线比较
3. Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System | Science Publications [J] . Abdelwadood M.A. MESLEH Journal of computer sciences . 2007,第6期

机译：基于卡方特征提取的Svms阿拉伯语文本分类系统科学出版物
4. A New Text Categorization (TC) Algorithm for Classifying Arabic Language Text Document [C] . KHALED ALHAWITI, NIDAL F. SHILBAYEH International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems . 2014

机译：用于分类阿拉伯语文本文档的新文本分类（TC）算法
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. The TREC 2004 genomics track categorization task: classifying full text biomedical documents [O] . Aaron M Cohen, William R Hersh 2006

机译：TREC 2004基因组学跟踪分类任务：对全文生物医学文献进行分类
7. Arabic Language Processing for Text Classification. Contributions to Arabic Root Extraction Techniques, Building An Arabic Corpus, and to Arabic Text Classification Techniques. [O] . Al-Nashashibi May Yacoub Adib 2012

机译：用于文本分类的阿拉伯语言处理。对阿拉伯语根提取技术，建立阿拉伯语语料库和阿拉伯文本分类技术的贡献。

A New Text Categorization (TC) Algorithm for Classifying Arabic Language Text Document

摘要

著录项

相似文献

相关主题

期刊订阅