首页> 外文会议>Language and Technology Conference >Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm
【24h】

Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm

机译:基于项目的AMHaric文档分类使用扩展的先验算法

获取原文

摘要

Document categorization is gaining importance due to the large volume of electronic information which requires automatic organization and pattern identification. Due to the morphological complexity of the language, automatic categorization of Amharic documents has become a difficult talk to carry out. This paper presents a system that categorizes Amharic documents based on the frequency of itemsets obtained after analyzing the morphology of the language. We selected seven categories into which a given document is to be classified. The task of categorization is achieved by employing an extended version of a priori algorithm which had been traditionally used for the purpose of knowledge mining in the form of association rules. The system is tested with a corpus containing Amharic news documents and experimental results are reported.
机译:由于需要自动组织和模式识别的大量电子信息,文档分类是增益的。由于语言的形态复杂性,Amharic文件的自动分类已成为一个难以执行的谈话。本文介绍了一个系统,该系统根据分析语言形态后获得的项目集的频率进行分类。我们选择了七个类别,给定文件将被分类。通过使用传统上用于以关联规则形式的知识挖掘目的的优先算法来实现分类的任务。该系统用含有Amharic新闻文件的语料库进行测试,并报告了实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号