首页> 外文会议>4th International Colloquium on Information Science and Technology >Arabic text classification methods: Systematic literature review of primary studies
【24h】

Arabic text classification methods: Systematic literature review of primary studies

机译:阿拉伯文本分类方法:基础研究的系统文献综述

获取原文
获取原文并翻译 | 示例

摘要

Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed.
机译:最近关于大数据的研究提出并评估了许多先进的技术,这些技术可以从万维网上复杂而大量的数据中获取有意义的信息。为了实现准确的文本分析,通常使用文本分类(TC)方法启动一个过程。回顾这一领域的最新文献表明,大多数研究都集中在英语(和其他文字)上,而对阿拉伯文本进行分类的尝试仍然相对有限。因此,我们打算通过搜索协议贡献出第一份系统文献综述(SLR),以严格总结用于分类阿拉伯文本的不同TC技术和方法的关键特征,这项工作还旨在识别和分享差距的科学证据在当前的文献中,以帮助建议进一步研究的领域。我们的SLR明确调查了经验证据,将其作为包括研究在内的决策因素,然后得出结论,即哪个分类器产生了更准确的结果。此外,我们的发现还发现阿拉伯文本缺乏标准的语料。作者自行编写,尽管其在诸如Twitter之类的社交媒体网络中得到了广泛使用,但大部分工作都集中在现代阿拉伯语上,而很少进行口语阿拉伯语的工作。总共调查了1464篇论文,其中包括48项主要研究并进行了分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号