【24h】

Discovering the Thematic Structure of the Quran using Probabilistic Topic Model

机译:使用概率主题模型发现古兰经的主题结构

获取原文
获取原文并翻译 | 示例

摘要

Topic modeling refers to extracting topics from text. Topic model is a statistical model whose aim is to discover topics from a large collection of documents. A topic consists of a collection of words that are more likely to be found together in the given context of that topic or theme. This paper applies a topic model to discover the thematic structure of the Quran. For centuries, the Quran has been widely studied for the topics it contains and the relationships among them. The Holy Quran is a treasure of tremendous amount of information that addresses various aspects of human life, social as well as individual. The information present in the Quran relates in a conceptual manner although its individual bits may look unstructured and scattered. This paper attempts to use a computational method to identify this hidden thematic structure automatically. We considered each surah in the Quran as a document and used Latent Dirichlet Allocation, a probabilistic topic modeling algorithm, to discover the topics/themes. The Arabic Quran was used as the corpus instead of transliteration or translation. Our results are very promising and we were able to discover the major themes in the surahs, along with the most important terms that describe these themes.
机译:主题建模是指从文本中提取主题。主题模型是一种统计模型,其目的是从大量文档中发现主题。主题由一组单词组成,这些单词在该主题或主题的给定上下文中更可能一起出现。本文应用主题模型发现古兰经的主题结构。几个世纪以来,人们对古兰经进行了广泛的研究,以了解其中包含的主题及其之间的关系。 《古兰经》是大量信息的宝库,涉及人类生活各个方面,无论是社会还是个人。尽管《古兰经》中的各个信息看起来有些杂乱无章,但它们在概念上是相关的。本文尝试使用一种计算方法来自动识别这种隐藏的主题结构。我们将古兰经中的每一个古兰经都视为文档,并使用了概率主题建模算法Latent Dirichlet Allocation来发现主题/主题。阿拉伯古兰经被用作语料库,而不是音译或翻译。我们的结果非常有希望,我们能够发现古兰经中的主要主题,以及描述这些主题的最重要术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号