Discovering URLs through User Feedback

机译：通过用户反馈发现URL

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Search engines reiy upon crawling to build their Web page collections. A Web crawler typically discovers new URLs by following the link structure induced by links on Web pages. As the number of documents on the Web is large, discovering newly created URLs may take arbitrarily long, and depending on how a given page is connected to others, such a crawler may miss the pages altogether. In this paper, we evaluate the benefits of integrating a passive URL discovery mechanism into a Web crawler. This mechanism is passive in the sense that it does not require the crawler to actively fetch documents from the Web to discover URLs. We focus here on a mechanism that uses toolbar data as a representative source for new URL discovery. We use the toolbar logs of Yahoo! to characterize the URLs that are accessed by users via their browsers, but not discovered by Yahoo! Web crawler. We show that a high fraction of URLs that appear in toolbar logs are not discovered by the crawler. We also reveal that a certain fraction of URLs are discovered by the crawler later than the time they are first accessed by users. One important conclusion of our work is that web search engines can highly benefit from user feedback in the form of toolbar logs for passive URL discovery.

机译：在爬网中搜索引擎Reiy建立他们的网页集合。 Web爬网程序通常通过遵循Web页面上的链接引起的链路结构来发现新的URL。随着Web上的文档的数量很大，发现新创建的URL可能是任意长的，并且根据给定页面的连接方式，这样的爬网程序可能会非常想念页面。在本文中，我们评估将被动URL发现机制集成到Web爬网履带中的好处。这种机制是被动的，因为它不需要爬虫来主动从网站上获取文档以发现URL。我们在这里专注于使用工具栏数据作为新URL发现的代表源的机制。我们使用雅虎的工具栏日志！要通过浏览器来表征用户访问的URL，但是雅虎未发现网履带。我们表明，爬虫未发现工具栏日志中出现的高分URL。我们还揭示了一小部分URL在履行者之后发现了他们首次被用户访问的时间。我们的工作的一个重要结论是，Web搜索引擎可以从用户的反馈中高效，以工具栏日志为被动URL发现。

著录项

来源
《ACM international conference on information and knowledge management》|2011年||共10页
会议地点
作者
Xiao Bai; B. Barla Cambazoglu; Flavio P. Junqueira;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
Search engines; Web crawling; URL discovery; toolbar;

机译：搜索引擎;网页爬行;URL发现;工具栏;

相似文献

外文文献
中文文献
专利

1. Anti-Phishing Game Framework to Educate Arabic Users: Avoidance of URLs Phishing Attacks [J] . Ahmed Baiomy, Mahmoud Mostafa, Alyaa Youssif Indian Journal of Science and Technology . 2019,第44期

机译：旨在教育阿拉伯用户的反网络钓鱼游戏框架：避免URL网络钓鱼攻击
2. LINK RECOMMENDER: Collaborative-Filtering for Recommending URLs to Twitter Users [J] . Nazpar Yazdanfar, Alex Thomo Procedia Computer Science . 2013,第1期

机译：LINK RECOMMENDER：为URL推荐给Twitter用户的协作过滤
3. Hybrid Data Aggregation Technique to Categorize the Web Users to Discover Knowledge About the Web Users [J] . Manohar E., Punithavathani D. Shalini Wireless personal communications: An Internaional Journal . 2017,第4期

机译：混合数据聚合技术将Web用户分类以发现有关Web用户的知识
4. Discovering URLs through User Feedback [C] . Xiao Bai, B. Barla Cambazoglu, Flavio P. Junqueira ACM international conference on information and knowledge management . 2011

机译：通过用户反馈发现URL
5. Helping Users Learn About Social Processes While Learning from Users: Developing a Positive Feedback in Social Computing. [D] . Pillutla, Venkata Sai Sriram. 2017

机译：在向用户学习的同时，帮助用户了解社交过程：在社交计算中获得积极的反馈。
6. Discovering Black Soap: A Survey on the Attitudes and Practices of Black Soap Users [O] . Ann Lin, Adam Nabatian, Caroline P. Halverstam 2017

机译：发现黑肥皂：黑肥皂使用者的态度和习惯调查
7. Behavioral and Coinductive Rewriting (invited talk)11The research reported in this paper has been supported in part by National Science Foundation grant CCR-9901002, and by the CafeOBJ project of the Information Promotion Agency (IPA), Japan, as part of its Advanced Software Technology Program.Note: all papers by the authors can be found on their websites, which respectively have the URLs http://www.ucsd.edu/users/{goguen, klin, grosu}. More information on the BOBJ system can be found at http://www.ucsd.edu/groups/tatami/bobj/.Note: all papers by the authors can be found on their websites, which respectively have the URLs www.ucsd.edu/users/{goguen, klin, grosu}. More information on the BOBJ system can be found at www.ucsd.edu/groups/tatami/bobj/. [O] . Goguen Joseph, Lin Kai, Roşu Grigore 2000

机译：行为和归纳重写（特邀演讲）11本文报道的研究得到了美国国家科学基金会（National Science Foundation）资助CCR-9901002以及日本信息促进局（IPA）的CafeOBJ项目的部分支持，这是其高级软件技术计划的一部分。可以在他们的网站（网址分别为http://www.ucsd.edu/users/{goguen，klin，grosu}）上找到作者。有关BOBJ系统的更多信息，请访问http://www.ucsd.edu/groups/tatami/bobj/。注意：作者的所有论文都可以在其网站上找到，它们的网址分别为www.ucsd。 edu / users / {goguen，klin，grosu}。有关BOBJ系统的更多信息，请访问www.ucsd.edu/groups/tatami/bobj/。
8. User Requirements Language (URL) Users Manual. [R] . Eiden, H. J., Moore, C. R. 1975

机译：用户需求语言（URL）用户手册。

Discovering URLs through User Feedback

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅