A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

Ammar Kamal Abasi; Ahamad Tajudin Khader; Mohammed Azmi Al-Betar; Syibrah Nairn; Sharif Naser Makhadmeh; Zaid Abdi Alkareem Alyasseri

首页> 外文期刊>Multimedia Tools and Applications >A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

【24h】

A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

机译：基于优化聚类的科学出版物新颖的集合统计主题提取方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The automatic topic extraction (TE) from scientific publications provides a very compact summary of the clusters' contents. This often helps in locating information easily. TE enables us to define the boundaries of the scientific fields. Text Document Clustering (TDC) represents, in general, the first step of topic identification to identify the documents, which address a related subject matter. Metaheuristics are typically used as efficient approaches for TDC. The multi-verse optimizer algorithm (MVO) involves a stochastic population-based algorithm. It has been recently proposed and successfully utilized to tackle many hard optimization problems. In the TE process, the focus of each statistical TE method is placed on various language feature space aspects. The aim of this paper is to design a novel ensemble method for an automatic TE from a collection of scientific publications based on MVO as the clustering algorithm. The automatic TE, which is used in our approach, is term frequency-inverse document frequency (TF-IDF), most frequent based keyword extraction (TF), co-occurrence statistical information-based keyword extraction (CSI), TextRank (TR), and mutual information (MI). A group of candidate topics can be provided by each automatic TE method for the proposed ensemble method. Next, the ensemble approach prunes the candidate topics' set via the application of a specific filtering heuristic. Then, their scores are recalculated based on the prescribed metrics. After that, for selecting a set of topics for certain scientific publications, dynamic threshold functions are applied. The findings emphasized the refined candidate set's efficiency, as well as effectiveness. The results also showed that the system's quality has been improved by new topics. The proposed method achieved better precision, as well as recall on a similar dataset compared to the state-of-the-art TE methods.

机译：来自科学出版物的自动主题提取（TE）提供了一个非常紧凑的集群内容摘要。这通常有助于您容易地定位信息。 TE使我们能够定义科学领域的界限。文本文档群集（TDC）一般代表主题识别的第一步，以识别该文件，该文件地解决了相关主题。综合学通常用作TDC的有效方法。多韵的优化器算法（MVO）涉及一种基于随机群体的算法。最近已经提出并成功地利用来解决许多艰难的优化问题。在TE过程中，每个统计TE方法的焦点都放在各种语言特征空间方面。本文的目的是为基于MVO作为聚类算法的科学出版物集合设计一种新的集合方法。在我们的方法中使用的自动TE是术语频率 - 逆文档频率（TF-IDF），最常用的基于的关键字提取（TF），基于共同发生的基于统计信息的关键字提取（CSI），Textrank（TR），和互信息（mi）。每个自动TE方法都可以提供一组候选主题，用于所提出的集合方法。接下来，集合方法通过应用特定过滤启发式的应用程序来修剪候选主题。然后，它们的分数基于规定的指标重新计算。之后，为了为某些科学出版物选择一组主题，应用动态阈值函数。调查结果强调了精致的候选集的效率，以及有效性。结果还表明，该系统的质量得到了新的主题。与最先进的TE方法相比，所提出的方法实现了更好的精度，以及在类似的数据集上召回。

著录项

来源
《Multimedia Tools and Applications》 |2021年第1期|37-82|共46页
作者
Ammar Kamal Abasi; Ahamad Tajudin Khader; Mohammed Azmi Al-Betar; Syibrah Nairn; Sharif Naser Makhadmeh; Zaid Abdi Alkareem Alyasseri;
展开▼
作者单位

School of Computer Sciences Universiti Sains Malaysia 11800 Penang Malaysia;

School of Computer Sciences Universiti Sains Malaysia 11800 Penang Malaysia;

Department of Information Technology - MSAI College of Engineering and Information Technology Ajman University Ajman United Arab Emirates Department of Information Technology Al-Huson University College Al-Balqa Applied University P.O. Box 50 Al-Huson Irbid Jordan;

Technology Department Endicott College of International Studies (ECIS) Woosong University Daejeon Korea;

School of Computer Sciences Universiti Sains Malaysia 11800 Penang Malaysia;

Center for Artificial Intelligence Faculty of Information Science and Technology Universiti Kebangsaan Malaysia 43600 Bangi Selangor Malaysia ECE Department-Faculty of Engineering University of Kufa Najaf Iraq;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Topic extraction; Ensemble methods; Multi-Verse optimizer Scientific text clustering; Metaheuristic algorithm;

机译：主题提取;合奏方法;多节能化器科学文本聚类;弥撒算法;

相似文献

外文文献
中文文献
专利

1. An ensemble topic extraction approach based on optimization clusters using hybrid multi-verse optimizer for scientific publications [J] . Abasi Ammar Kamal, Khader Ahamad Tajudin, Al-Betar Mohammed Azmi, Journal of ambient intelligence and humanized computing . 2021,第2期

机译：基于优化群集的合并主题提取方法，采用混合多韵优化器进行科学出版物
2. News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark [J] . Computers, Materials & Continua . 2020,第1期

机译：Spark上基于TF-IDF算法的新闻文本主题聚类优化方法
3. Global optimization method using ensemble of metamodels based on fuzzy clustering for design space reduction [J] . Pengcheng Ye, Guang Pan Engineering with Computers . 2017,第3期

机译：基于模糊聚类的元模型集成全局优化方法
4. PTR: Phrase-Based Topical Ranking for Automatic Keyphrase Extraction in Scientific Publications [C] . Minmei Wang, Bo Zhao, Yihua Huang International conference on neural information processing . 2016

机译：PTR：基于短语的主题排名，用于科学出版物中的自动关键词提取
5. Clustering-based multiresolution methods for scientific visualization. [D] . Heckel, Bjoern. 2000

机译：用于科学可视化的基于聚类的多分辨率方法。
6. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods [O] . Lovro Šubelj, Nees Jan van Eck, Ludo Waltman -1

机译：基于引用关系对科学出版物进行聚类：不同方法的系统比较
7. DEVELOPMENT OF METHODS FOR AUTOMATIC EXTRACTION OF KNOWLEDGE FROM TEXTS OF SCIENTIFIC PUBLICATIONS FOR THE CREATION OF A KNOWLEDGE BASE SOLANUM TUBEROSUM [O] . O.V. Saik, N.A. Kolchanov, T.V. Ivanisenko, 2017

机译：从科学出版物文本自动提取知识方法的制定，以创建知识库茄蛋白

A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

摘要

著录项

相似文献

相关主题

期刊订阅