【24h】

Discovering the Thematic Structure of the Quran using Probabilistic Topic Model

机译:使用概率主题模型发现古兰经的主题结构

获取原文

摘要

Topic modeling refers to extracting topics from text. Topic model is a statistical model whose aim is to discover topics from a large collection of documents. A topic consists of a collection of words that are more likely to be found together in the given context of that topic or theme. This paper applies a topic model to discover the thematic structure of the Quran. For centuries, the Quran has been widely studied for the topics it contains and the relationships among them. The Holy Quran is a treasure of tremendous amount of information that addresses various aspects of human life, social as well as individual. The information present in the Quran relates in a conceptual manner although its individual bits may look unstructured and scattered. This paper attempts to use a computational method to identify this hidden thematic structure automatically. We considered each surah in the Quran as a document and used Latent Dirichlet Allocation, a probabilistic topic modeling algorithm, to discover the topics/themes. The Arabic Quran was used as the corpus instead of transliteration or translation. Our results are very promising and we were able to discover the major themes in the surahs, along with the most important terms that describe these themes.
机译:主题建模是指从文本中提取主题。主题模型是一个统计模型,其目的是发现来自一系列文件的主题。主题包括集合,这些单词更有可能在该主题或主题的给定上下文中找到。本文适用一个主题模型来发现古兰经的主题结构。几个世纪以来,古兰经已被广泛研究它包含的主题以及其中的关系。圣古兰斯是巨大信息的宝藏,这些信息涉及人类生活,社会和个人的各个方面。古兰经中存在的信息以概念方式涉及概念性的方式,尽管其各个位可能看起来非结构化和分散。本文试图使用计算方法自动识别此隐藏的主题结构。我们将古兰经中的每个Surah视为文档,并使用潜在的Dirichlet分配,概率主题建模算法,以发现主题/主题。阿拉伯语古兰经被用作语料库而不是音译或翻译。我们的结果非常有前途,我们能够发现Surahs中的主要主题,以及描述这些主题的最重要的术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号