首页> 外文会议>International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems >A New Text Categorization (TC) Algorithm for Classifying Arabic Language Text Document
【24h】

A New Text Categorization (TC) Algorithm for Classifying Arabic Language Text Document

机译:用于分类阿拉伯语文本文档的新文本分类(TC)算法

获取原文

摘要

Automatic text categorization (TC) has become one of the most interesting fields for researchers in data mining, information retrieval, web text mining, as well as natural language processing paradigms due to the vast number of new documents being retrieved for various information retrieval systems. This paper proposes a new TC technique, which classifies Arabic language text documents using clustering algorithm. The proposed technique is based on classifying documents using the Local Sparsity Coefficient-mine algorithm (LSC-mine algorithm); this algorithm is an outlier detection algorithm that belongs to clustering paradigm. The adopted algorithm is capable of detecting outlier points in a spatial space; the discovering process is accomplished through computing the Local sparsity ratio (LSC), which indicates the outlier-ness of a certain point. The algorithm is conducted and experemints on Arabic document files gathered from the internet website. The algorithm is implemented using the Visual C++.NET programming language.
机译:自动文本分类(TC)已成为数据挖掘,信息检索,Web文本挖掘的研究人员最有趣的字段之一,以及由于针对各种信息检索系统检索的大量新文档而导致的自然语言处理范例。本文提出了一种新的TC技术,它使用聚类算法对阿拉伯语文本文档进行分类。所提出的技术基于使用局部稀疏系数矿矿物算法的分类文档(LSC-MINE算法);该算法是一个属于群集范例的异常检测算法。所采用的算法能够检测空间中的异常点;发现过程是通过计算局部稀疏比(LSC)来实现的,这表示某个点的异常。在从Internet网站收集的阿拉伯文档文件上进行了该算法和实验。算法使用Visual C ++实现。NET编程语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号