首页> 外文期刊>Journal of supercomputing >Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform
【24h】

Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform

机译:云平台上大型社交网络的自动内容提取和时间感知主题聚类

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, with the increase in users in social network, the social network has had the feature of big data. The large-scale social network has become an indispensable part in people's life. However, the traditional data mining technology cannot suit the large-scale social network. Thus, it is urgent to develop a more suitable mining technology for the large-scale social network. In this section, a crawler model based on semantic analysis and spatial clustering is proposed firstly. Then, the content extraction model based on document object model tree is built to extract the target text information from the links fetched by the proposed crawler model. The similarities between textual information in different regions are computed to choose the important information. Moreover, a two-stage topic clustering model based on time information is presented. The time information is introduced into the similarity computation between two posts or clusters. The single-pass algorithm is improved and applied in different clustering stage to improve the clustering accuracy. Finally, the proposed algorithms are evaluated on Hadoop platform. The Hadoop platform can effectively reduce the computing time and improve the server quality of users in large-scale social network. Meanwhile, the experiments demonstrate that the proposed algorithms are suitable for the data processing in large-scale social network.
机译:近年来,随着社交网络用户的增加,社交网络具有大数据的特征。大规模的社交网络已成为人们生活中不可或缺的一部分。但是,传统的数据挖掘技术无法适应大规模的社交网络。因此,迫切需要为大规模社交网络开发更合适的挖掘技术。本节首先提出一种基于语义分析和空间聚类的爬虫模型。然后,建立基于文档对象模型树的内容提取模型,以从建议的搜寻器模型获取的链接中提取目标文本信息。计算不同区域中文本信息之间的相似度以选择重要信息。此外,提出了一种基于时间信息的两阶段主题聚类模型。将时间信息引入两个帖子或群集之间的相似度计算中。对单遍算法进行了改进,并将其应用于不同的聚类阶段,以提高聚类的准确性。最后,在Hadoop平台上对提出的算法进行了评估。 Hadoop平台可以有效减少大型社交网络中的计算时间并提高用户的服务器质量。同时,实验表明该算法适用于大规模社交网络中的数据处理。

著录项

  • 来源
    《Journal of supercomputing》 |2019年第5期|2890-2924|共35页
  • 作者

    Li Chunlin; Bai Jingpan;

  • 作者单位

    Minist Land & Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China|Wuhan Univ Technol, Dept Comp Sci, Wuhan 430063, Hubei, Peoples R China|Guangzhou Inst Geog, Key Lab Guangdong Utilizat Remote Sensing & Geog, Guangzhou, Guangdong, Peoples R China|Nanjing Univ Informat Sci & Technol, Collaborat Innovat Ctr Atmospher Environm & Equip, Jiangsu Key Lab Meteorol Observat & Informat Proc, Nanjing, Jiangsu, Peoples R China;

    Wuhan Univ Technol, Dept Comp Sci, Wuhan 430063, Hubei, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Content extraction; Topic clustering; Network community; Large-scale social network; Cloud platform;

    机译:内容提取;主题聚类;网络社区;大型社交网络;云平台;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号