首页> 外文期刊>Expert systems with applications >Automatic classification of Tamil documents using vector space model and artificial neural network
【24h】

Automatic classification of Tamil documents using vector space model and artificial neural network

机译:使用向量空间模型和人工神经网络对泰米尔语文件进行自动分类

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic text classification based on vector space model (VSM), artificial neural networks (ANN), K-nearest neighbor (KNN), Naives Bayes (NB) and support vector machine (SVM) have been applied on English language documents, and gained popularity among text mining and information retrieval (IR) researchers. This paper proposes the application of VSM and ANN for the classification of Tamil language documents. Tamil is morphologically rich Dravidian classical language. The development of internet led to an exponential increase in the amount of electronic documents not only in English but also other regional languages. The automatic classification of Tamil documents has not been explored in detail so far. In this paper, corpus is used to construct and test the VSM and ANN models. Methods of document representation, assigning weights that reflect the importance of each term are discussed. In a traditional word-matching based categorization system, the most popular document representation is VSM. This method needs a high dimensional space to represent the documents. The ANN classifier requires smaller number of features. The experimental results show that ANN model achieves 93.33% which is better than the performance of VSM which yields 90.33% on Tamil document classification.
机译:基于向量空间模型(VSM),人工神经网络(ANN),K近邻(KNN),朴素贝叶斯(NB)和支持向量机(SVM)的自动文本分类已被应用到英语文档中,并获得了广泛的应用在文本挖掘和信息检索(IR)研究人员中。本文提出了VSM和ANN在泰米尔语语言文档分类中的应用。泰米尔语是形态丰富的德拉维古典语言。互联网的发展导致不仅英语而且其他地区语言的电子文档数量呈指数增长。到目前为止,尚未详细探讨泰米尔文文件的自动分类。在本文中,语料库用于构建和测试VSM和ANN模型。讨论了文档表示方法,分配权重以反映每个术语的重要性。在传统的基于单词匹配的分类系统中,最流行的文档表示形式是VSM。此方法需要高维空间来表示文档。 ANN分类器需要较少的功能。实验结果表明,ANN模型达到了93.33%,优于VSM的性能,VSM在泰米尔文分类中的收益率为90.33%。

著录项

  • 来源
    《Expert systems with applications》 |2009年第8期|10914-10918|共5页
  • 作者单位

    Annamalai University, Department of Computer Science and Engineering, Annamalainagar, Chidambaram, India;

    Annamalai University, Department of Computer Science and Engineering, Annamalainagar, Chidambaram, India;

    Centre for Advanced Studies in Linguistics, Annamalai University, Annamalainagar, Chidambaram, India;

    Annamalai University, Department of Computer Science and Engineering, Annamalainagar, Chidambaram, India;

    Annamalai University, Department of Computer Science and Engineering, Annamalainagar, Chidambaram, India;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    tamil text classification; vector space model; artificial neural network model; corpus building;

    机译:泰米尔语文字分类;向量空间模型人工神经网络模型;语料库建设;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号