首页> 外文期刊>IAENG Internaitonal journal of computer science >Social Media Mining: A Genetic Based Multiobjective Clustering Approach to Topic Modelling
【24h】

Social Media Mining: A Genetic Based Multiobjective Clustering Approach to Topic Modelling

机译:社交媒体挖掘:主题建模的基于基于遗传的多目标聚类方法

获取原文
获取原文并翻译 | 示例
       

摘要

Social media mining is the process of collecting large datasets from user-generated content and extracting and analyzing social media interactions to recognize meaningful patterns in individual and social behavior. Everyday, more contents related to social media are generated by social media users (e.g., Facebook, Twitter). As the components of big data continue to expand, the task of extracting useful information becomes critical. Topic extraction refers to the process of extracting main topics from the pool of news feed and a typical method to perform topic extraction is through clustering. Clustering defines or organizes a group of patterns or objects into clusters, allows high-dimensional data to be presented in an apprehensive fashion to humans. Although effective, the performance of the A-means clustering algorithm depends heavily on the initial centroids and the number of clusters, k. Recently, several effective supervised and unsupervised machine learning methods have been developed in the domain of topics extraction. However, less works have been conducted in applying multiobjective based algorithm for topic extraction. Most of these algorithms are not optimized, even if they are, they are only optimized by using a single objective method and may underperform when solving real-world problems which are typically multi-objectives in nature. This paper investigates the effects of using a multiobjective genetic algorithm (MOGA) based clustering technique to cluster texts for topic extraction which is designed based on the structure and purity of the clusters in order to determine the optimal initial centroids and the number of clusters, k. Then, the mapping percentages between the predefined and produced clusters are used to assess the performance of the proposed algorithm. The best mapping percentage of 62.7 obtained using the proposed algorithm when k = 15 is obtained to outperform the performance of the generic k-means. The top five most representative words from each cluster are also extracted and validated by computing the number of tweets related to the predefined tags.
机译:社交媒体挖掘是从用户生成的内容收集大型数据集的过程,提取和分析社交媒体交互,以识别个人和社会行为中有意义的模式。每天,与社交媒体相关的更多内容是由社交媒体用户(例如,Facebook,Twitter)生成的。随着大数据的组成部分继续扩展,提取有用信息的任务变得至关重要。主题提取是指从新闻源池中提取主要主题的过程,以及通过聚类执行主题提取的典型方法是通过聚类。群集定义或将一组模式或物体组织成簇,允许以令人担忧的方式向人类提供高维数据。虽然有效,A-mear聚类算法的性能大量取决于初始质心和群集数量k。最近,在提取主题领域开发了几种有效的监督和无监督的机器学习方法。然而,在应用基于多目标的算法的主题提取中,已经进行了较少的作品。这些算法中的大多数都未得到优化,即使它们是,它们仅通过使用单个目标方法优化,并且在解决通常在自然界中的多目标的实际问题时可能低于uport。本文研究了基于多目标遗传算法(MOGA)聚类技术对基于集群的结构和纯度的主题提取的群集文本的影响,以确定最佳初始质心和群集的最佳初始质心和克拉。然后,预定义和产生的集群之间的映射百分比用于评估所提出的算法的性能。当获得k = 15时,使用所提出的算法获得的最佳映射百分比为62.7以优于通用k均值的性能。通过计算与预定义标签相关的推文的数量,还提取来自每个群集的最多五个来自每个群集的最多代表性单词。

著录项

  • 来源
  • 作者单位

    Knowledge Technology Research Unit Faculty of Computing and Informatics Universiti Malaysia Sabah Jalan UMS 88400 Kota Kinabalu Sabah Malaysia;

    Knowledge Technology Research Unit Faculty of Computing and Informatics Universiti Malaysia Sabah Jalan UMS 88400 Kota Kinabalu Sabah Malaysia;

    Knowledge Technology Research Unit Faculty of Computing and Informatics Universiti Malaysia Sabah Jalan UMS 88400 Kota Kinabalu Sabah Malaysia;

    School of Information Science Security and Networks Area Japan Advanced Institute of Science and Technology Access 1-1 Asahidai Nomi Ishikawa 923-1292 Japan;

    Department of Informatics Universitas Mulawarman Samarinda Indonesia;

    Department of Multimedia Faculty of Computer Science and Information Technology UPM Malaysia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Multi-Objectives; Genetic Algorithm; Clustering; Social Media Mining; Topics Extraction;

    机译:多目标;遗传算法;聚类;社交媒体矿业;主题提取;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号