An improved focused crawler based on Semantic Similarity Vector Space Model

Du Yajun; Liu Wenjun; Lv Xianjing; Peng Guoli

首页> 外文期刊>Applied Soft Computing >An improved focused crawler based on Semantic Similarity Vector Space Model

【24h】

An improved focused crawler based on Semantic Similarity Vector Space Model

机译：基于语义相似度向量空间模型的改进型聚焦爬虫

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. In many studies, the Vector Space Model (VSM) and Semantic Similarity Retrieval Model (SSRM) take advantage of cosine similarity and semantic similarity to compute similarities between web pages and the given topic. However, if there are no common terms between a web page and the given topic, the VSM will not obtain the proper topical similarity of the web page. In addition, if all of the terms between them are synonyms, then the SSRM will also not obtain the proper topical similarity. To address these problems, this paper proposes an improved retrieval model, the Semantic Similarity Vector Space Model (SSVSM), which integrates the TF*IDF values of the terms and the semantic similarities among the terms to construct topic and document semantic vectors that are mapped to the same double-term set, and computes the cosine similarities between these semantic vectors as topic-relevant similarities of documents, including the full texts and anchor texts of unvisited hyperlinks. Next, the proposed model predicts the priorities of the unvisited hyperlinks by integrating the full text and anchor text topic-relevant similarities. The experimental results demonstrate that this approach improves the performance of the focused crawlers and outperforms other focused crawlers based on Breadth-First, VSM and SSRM. In conclusion, this method is significant and effective for focused crawlers. (C) 2015 Elsevier B.V. All rights reserved.

机译：聚焦爬虫是特定于主题的，旨在有选择地从Internet收集与给定主题相关的网页。在许多研究中，向量空间模型（VSM）和语义相似度检索模型（SSRM）利用余弦相似度和语义相似度来计算网页与给定主题之间的相似度。但是，如果网页和给定主题之间没有通用术语，则VSM将无法获得网页的适当主题相似性。另外，如果它们之间的所有术语都是同义词，那么SSRM也将不会获得适当的主题相似性。为了解决这些问题，本文提出了一种改进的检索模型，即语义相似度向量空间模型（SSVSM），该模型整合了术语的TF * IDF值和术语之间的语义相似度，以构建主题和文档的语义向量。到相同的双向集合，并计算这些语义向量之间的余弦相似度，作为文档的主题相关相似度，包括未访问超链接的全文本和锚文本。接下来，提出的模型通过整合全文和锚文本主题相关的相似性来预测未访问超链接的优先级。实验结果表明，该方法提高了集中爬虫的性能，并且优于基于广度优先，VSM和SSRM的其他集中爬虫。综上所述，该方法对于集中式爬虫非常有效。（C）2015 Elsevier B.V.保留所有权利。

著录项

来源
《Applied Soft Computing》 |2015年第null期|共16页
作者
Du Yajun; Liu Wenjun; Lv Xianjing; Peng Guoli;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词
Focused crawler; Semantic similarity; VSM; SSRM;

机译：聚焦爬虫;语义相似度;VSM;SSRM;

相似文献

外文文献
中文文献
专利

1. An improved focused crawler based on Semantic Similarity Vector Space Model [J] . Du Yajun, Liu Wenjun, Lv Xianjing, Applied Soft Computing . 2015,第Null期

机译：基于语义相似度向量空间模型的改进型聚焦爬虫
2. An extension to association rules using a similarity-based approach in semantic vector spaces [J] . Keith Norambuena Brian, Meneses Villegas Claudio Intelligent data analysis . 2019,第3期

机译：在语义向量空间中使用基于相似性的方法扩展关联规则
3. An extension to association rules using a similarity-based approach in semantic vector spaces [J] . Keith Norambuena Brian, Meneses Villegas Claudio Intelligent data analysis . 2019,第3期

机译：在语义向量空间中使用基于相似性的方法的关联规则的扩展
4. Semantic Web Service Similarity Ranking Proposal Based on Semantic Space Vector Model [C] . Zeng ZhiHao, Hu JiPing, Dong Ting, Intelligent System Design and Engineering Application (ISDEA), 2012 Second International Conference on . 2012

机译：基于语义空间矢量模型的语义Web服务相似度排序建议
5. Predictive Modeling of Complex Graphs as Context and Semantics Preserving Vector Spaces [D] . Moon, Changsung. 2018

机译：复杂图的预测建模作为上下文和保留向量空间的语义
6. IDSSIM: an lncRNA functional similarity calculation model based on an improved disease semantic similarity method [O] . Wenwen Fan, Junliang Shang, Feng Li, 2020

机译：IDSSIM：基于改进疾病语义相似方法的LNCRNA功能相似性计算模型
7. An Improved Focused Web Crawler based on Hybrid Similarity [O] . Shang Songtao, Wu Huaiguang, Ma Jiangtao 2019

机译：一种基于混合相似性的改进的聚焦网络履带

An improved focused crawler based on Semantic Similarity Vector Space Model

摘要

著录项

相似文献

相关主题

期刊订阅