首页> 外文期刊>International journal of web information systems >A lexicon based approach for classifying Arabic multi-labeled text
【24h】

A lexicon based approach for classifying Arabic multi-labeled text

机译:基于词典的阿拉伯多标签文本分类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles. Design/methodology/approach - This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label. Findings - The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent. Originality/value - Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.
机译:目的-多标签文本分类(MTC)是数据挖掘和信息检索领域中的最新研究趋势之一,原因有很多,例如在线数据的快速增长和互联网用户对分配感到更满意的趋势不断增加多个标签/标记来描述文档,电子邮件,帖子等。与传统的单标签文本分类(TC)相比,标签的维数使MTC更加困难且更具挑战性。由于它是TC的自然扩展,因此提出了几种通过所谓的问题转换(PT)方法从TC丰富的文献中受益的方法。基本上,PT方法将多标签数据转换为适用于传统单标签分类算法的单标签数据。另一种方法是设计针对MTC定制的新颖分类算法。在过去的十年中,针对这两种方法的著作主要集中在英语方面。这项工作旨在对阿拉伯文的MTC进行详尽的研究。设计/方法/方法-本文提出了一种新颖的基于词汇的MTC方法,该方法从训练数据中提取与每个标签最相关的关键字以及一个阈值,该阈值随后可用于确定每个测试文档是否属于到某个标签。结果-实验表明,所提出的方法优于目前可用的方法。具体而言,我们的实验结果表明,从现有方法获得的最佳准确性仅为18%,而所提出的基于词典的方法的准确性可以达到31%的准确性。原创性/价值-尽管存在一些可以自定义的工具可以解决阿拉伯文本的MTC问题,但是将其应用于阿拉伯文文章时的准确性非常低。本文提出了一种新的MTC方法。实验表明,所提出的方法优于目前可用的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号