首页> 外文会议>ISKE 2013 >A Multiple-Phase Stratification-Based Hierarchical Clustering Over a Deep Web Data Source

【24h】

A Multiple-Phase Stratification-Based Hierarchical Clustering Over a Deep Web Data Source

机译：基于多相分层的基于分层的分层聚类，在深网络数据源上

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Compared with surface web, deep web stores more high-quality data, and data mining over deep web is more valuable. Nevertheless, in deep web, the entire data sets are stored in back-end databases and cannot be accessed directly, and data can only be retrieved over the Internet through query forms. The only particular method for mining a deep web data source is to sample the data set, which caused several unique challenges. In this paper, according to active learning, instead of traditional one-time sample allocation, we use multiple phases of sample allocation, which improves the representativeness of our gained samples. At the step of stratified sampling in each phase, we sample parts of representative samples for initial clustering. Using gained clusters, we can explore boundary points in them. A boundary point owns much uncertainty than others; for example, it contains more information. Sampling on a boundary point is useful to gain more representative samples. According to our experiments, our method performs better than random sampling and two-phase sampling in Liu and Agrawal (Int Conf Data Mining 70-81, 2012) at the same sampling costs.

机译：与表面的Web相比，Deep Web的商店更优质的数据和数据挖掘过深网是更有价值的。然而，在深层网络，整个数据集存储在后端数据库，不能直接访问，数据只能在互联网上通过查询形式进行检索。一种用于开采的Deep Web数据源的唯一特定的方法是采样数据集，这引起了一些独特的挑战。在本文中，根据主动学习，而不是传统的一次性样本分配，我们使用的样品分配，从而提高我们获得的样本的代表性的多个阶段。在分层抽样的每个阶段中的步骤中，我们采样代表性样品的零件初始聚类。使用获得的集群，我们可以在其中探索边界点。边界点拥有比别人多的不确定性;例如，它包含了更多的信息。采样上的边界点是为了获得更多的代表性样品有用。根据我们的实验，我们的方法比在相同的采样成本随机抽样和两相抽样刘和阿格拉瓦尔（智力CONF数据挖掘70-81，2012）更好。

著录项

来源
《ISKE 2013》|2014年||共10页
会议地点
作者
Yuanliu Liu; Pengpeng Zhao; Xu Zhou; Zhiming Cui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
Deep web; Active learning; Stratified sampling; Hierarchical clustering;

机译：深网络;主动学习;分层采样;分层群集;

相似文献

外文文献
中文文献
专利

1. Stratified K-means Clustering Over A Deep Web Data Source [J] . Tantan Liu, Gagan Agrawal SIGKDD explorations . 2012,第CDaROM期

机译：深度Web数据源上的分层K均值聚类
2. Optimal sentence clustering for web database using hierarchical fuzzy relational clustering integrated with artificial bee colony algorithm [J] . Santhi Venkatraman, R. Prasanthini International journal of business information systems . 2018,第3期

机译：结合人工蜂群算法的层次模糊关系聚类的Web数据库最优句子聚类
3. A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data [J] . Hulot Audrey, Lalo? Denis, Jaffrézic Florence BMC Bioinformatics . 2021,第1期

机译：从多源数据集成多个分层群集或网络的统一框架
4. A Multiple-Phase Stratification-Based Hierarchical Clustering Over a Deep Web Data Source [C] . Yuanliu Liu, Pengpeng Zhao, Xu Zhou, ISKE 2013 . 2014

机译：基于多相分层的基于分层的分层聚类，在深网络数据源上
5. SEEDEEP: A system for exploring and querying deep web data sources. [D] . Wang, Fan. 2010

机译：SEEDEEP：一种用于浏览和查询深层Web数据源的系统。
6. A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data [O] . Audrey Hulot, Denis Laloë, Florence Jaffrézic 2021

机译：从多源数据集成多个分层群集或网络的统一框架
7. Stratification Based Hierarchical Clustering Over a Deep Web Data Source [O] . Tantan Liu, Gagan Agrawal 2013

机译：深度Web数据源上基于分层的层次集群

A Multiple-Phase Stratification-Based Hierarchical Clustering Over a Deep Web Data Source

摘要

著录项

相似文献

相关主题

期刊订阅