Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform

Li Chunlin; Bai Jingpan

首页> 外文期刊>Journal of supercomputing >Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform

【24h】

Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform

机译：云平台上大型社交网络的自动内容提取和时间感知主题聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, with the increase in users in social network, the social network has had the feature of big data. The large-scale social network has become an indispensable part in people's life. However, the traditional data mining technology cannot suit the large-scale social network. Thus, it is urgent to develop a more suitable mining technology for the large-scale social network. In this section, a crawler model based on semantic analysis and spatial clustering is proposed firstly. Then, the content extraction model based on document object model tree is built to extract the target text information from the links fetched by the proposed crawler model. The similarities between textual information in different regions are computed to choose the important information. Moreover, a two-stage topic clustering model based on time information is presented. The time information is introduced into the similarity computation between two posts or clusters. The single-pass algorithm is improved and applied in different clustering stage to improve the clustering accuracy. Finally, the proposed algorithms are evaluated on Hadoop platform. The Hadoop platform can effectively reduce the computing time and improve the server quality of users in large-scale social network. Meanwhile, the experiments demonstrate that the proposed algorithms are suitable for the data processing in large-scale social network.

机译：近年来，随着社交网络用户的增加，社交网络具有大数据的特征。大规模的社交网络已成为人们生活中不可或缺的一部分。但是，传统的数据挖掘技术无法适应大规模的社交网络。因此，迫切需要为大规模社交网络开发更合适的挖掘技术。本节首先提出一种基于语义分析和空间聚类的爬虫模型。然后，建立基于文档对象模型树的内容提取模型，以从建议的搜寻器模型获取的链接中提取目标文本信息。计算不同区域中文本信息之间的相似度以选择重要信息。此外，提出了一种基于时间信息的两阶段主题聚类模型。将时间信息引入两个帖子或群集之间的相似度计算中。对单遍算法进行了改进，并将其应用于不同的聚类阶段，以提高聚类的准确性。最后，在Hadoop平台上对提出的算法进行了评估。 Hadoop平台可以有效减少大型社交网络中的计算时间并提高用户的服务器质量。同时，实验表明该算法适用于大规模社交网络中的数据处理。

著录项

来源
《Journal of supercomputing》 |2019年第5期|2890-2924|共35页
作者
Li Chunlin; Bai Jingpan;
展开▼
作者单位

Minist Land & Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China|Wuhan Univ Technol, Dept Comp Sci, Wuhan 430063, Hubei, Peoples R China|Guangzhou Inst Geog, Key Lab Guangdong Utilizat Remote Sensing & Geog, Guangzhou, Guangdong, Peoples R China|Nanjing Univ Informat Sci & Technol, Collaborat Innovat Ctr Atmospher Environm & Equip, Jiangsu Key Lab Meteorol Observat & Informat Proc, Nanjing, Jiangsu, Peoples R China;

Wuhan Univ Technol, Dept Comp Sci, Wuhan 430063, Hubei, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Content extraction; Topic clustering; Network community; Large-scale social network; Cloud platform;

机译：内容提取;主题聚类;网络社区;大型社交网络;云平台;

相似文献

外文文献
中文文献
专利

1. Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform [J] . Li Chunlin, Bai Jingpan Journal of supercomputing . 2019,第5期

机译：云平台上大型社交网络的自动内容提取与时间感知主题聚类
2. Automatic extraction of social networks by topics of interest [J] . Fernando de la Rosa T., Rafael M. Gasca International Journal of Computer Applications in Technology . 2008,第4期

机译：通过感兴趣的主题自动提取社交网络
3. A time-aware hyperlink-induced topic search-based reputation evaluation method for optimal manufacturing service recommendation in distributed peer-to-peer networks [J] . Shuai Zhang, Song Xu, Wenyu Zhang, Journal of algorithms & computational technology . 2017,第1期

机译：一种基于时间的超链接诱导的基于主题搜索的信誉评估方法，用于分布式对等网络中的最佳制造服务推荐
4. Enhancing Interdisciplinary Cooperation by Social Platforms Assessing the Usefulness of Bibliometric Social Network Visualization in Large-Scale Research Clusters [C] . Andre Calero Valdez, Anne Kathrin Schaar, Martina Ziefle, International conference on human-computer interaction . 2014

机译：通过社交平台加强跨学科合作，评估文献计量社会网络可视化在大型研究集群中的有用性
5. Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach. [D] . Yang, Seungwon. 2013

机译：基于扩展-提取方法的文本自动识别主题标签。
6. Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform [O] . Bin Yang, Wenzheng Bao, De-Shuang Huang, -1

机译：利用MapReduce并行云平台推理大规模时延基因调控网络
7. A time-aware hyperlink-induced topic search-based reputation evaluation method for optimal manufacturing service recommendation in distributed peer-to-peer networks [O] . Shuai Zhang, Song Xu, Wenyu Zhang, 2017

机译：一种时间感知超链接引发的基于主题搜索的信誉评估方法，用于在分布式对等网络中进行最优制造服务推荐

Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform

摘要

著录项

相似文献

相关主题

期刊订阅