CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance

机译：Citedata：用于评估个性化搜索性能的新多方面数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Personalized search systems have evolved to utilize heterogeneous features including document hyperlinks, category labels in various taxonomies and social tags in addition to free-text of the documents. Consequently, classifiers, PageR-ank algorithms and Collaborative Filtering methods are often used as intermediate steps in such personalized retrieval systems. Thorough comparative evaluation of such complex systems has been difficult due to the lack of appropriate publicly available datasets that provide such diverse feature sets. To remedy the situation, we have created CiteData, a new dataset for benchmark evaluations of personalized search performance, that will be made publicly accessible. CiteData is a collection of academic articles extracted from CiteULike and CiteSeer repositories, with rich feature sets such as authors, author-affiliations, topic labels, social tags and citation information. We further supplement it with personalized queries and relevance judgments which were obtained from volunteer users. This paper starts with a discussion of the design criteria and characteristics of the CiteData dataset in comparison with current benchmark datasets, followed by a set of task-oriented empirical evaluations of popular algorithms in statistical classification, collaborative filtering and link analysis as intermediate steps for personalized search. Our results show significant performance improvement of personalized approaches, over that of unpersonalized approaches. We also observe that a meta personalized search engine that leverages information from multiple sources of features performs better than algorithms that use only one of the constituent source of features.

机译：个性化的搜索系统已经进化以利用异构特征，包括文档超链接，除了文件的自由文本之外，各种分类和社交标签中的类别标签还有类别标签。因此，分类器，寻呼机-ANK算法和协同滤波方法通常用作这种个性化检索系统中的中间步骤。由于缺乏提供此类不同特征集的适当公开可用的数据集，对这种复杂系统的彻底比较评估一直很困难。为了解决这种情况，我们创建了一个新的数据集，用于个性化搜索性能的基准评估，将可公开访问。 Citedata是从Citeulik和CiteEser存储库中提取的学术文章的集合，具有丰富的功能集，如作者，作者 - 隶属关系，主题标签，社交标签和引用信息。我们进一步补充了与志愿用户获得的个性化查询和相关性判决。本文开始讨论Citedata DataSet的设计标准和特征，与当前的基准数据集相比，其次是一组针对统计分类，协作滤波和链接分析的流行算法的一组任务导向的实证评估，作为个性化的中间步骤搜索。我们的结果表明，个性化方法的显着性能提高，超出了个性化方法。我们还观察到，从多个特征源利用信息的元个性化搜索引擎比仅使用一个组成特征来源的算法更好地执行更好的算法。

著录项

来源
《ACM conference on information and knowledge management》|2010年||共9页
会议地点
作者
Abhay Harpale; Yiming Yang; Siddharth Gopal; Daqing He; Zhen Yue;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
personalization; search; evaluation; dataset; social data;

机译：个性化;搜索;评估;数据集;社会数据;

相似文献

外文文献
中文文献
专利

1. Datasets on statistical analysis and performance evaluation of backtracking search optimisation algorithm compared with its counterpart algorithms [J] . Bryar A. Hassan, Tarik A. Rashid Data in Brief . 2020,第2期

机译：与其对应算法相比，回溯搜索优化算法统计分析和性能评估的数据集
2. Evaluation of an Application Search Interface Providing Multi-faceted Navigation [J] . Tomoko KAJIYAMA 電子情報通信学会技術研究報告. マルチメディア情報ハイディング·エンリッチメント . 2013,第291期

机译：提供多方位导航的应用程序搜索界面的评估
3. Evaluation of an Application Search Interface Providing Multi-faceted Navigation [J] . Tomoko KAJIYAMA 電子情報通信学会技術研究報告. 応用音響. Engineering Acoustics . 2013,第290期

机译：提供多方位导航的应用程序搜索界面的评估
4. CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance [C] . Abhay Harpale, Yiming Yang, Siddharth Gopal, CIKM 10;ACM conference on information and knowledge management . 2011

机译：CiteData：用于评估个性化搜索性能的新型多面数据集
5. A framework for personalizing web search with multi-faceted user profiles. [D] . Leung, Wai-Ting. 2010

机译：使用多方面的用户个人资料个性化Web搜索的框架。
6. Datasets on statistical analysis and performance evaluation of backtracking search optimisation algorithm compared with its counterpart algorithms [O] . Bryar A. Hassan, Tarik A. Rashid 2020

机译：回溯搜索优化算法与其对应算法相比的统计分析和性能评估数据集
7. Performance Evaluation of a Proposed Machine Learning Model for Chronic Disease Datasets Using an Integrated Attribute Evaluator and an Improved Decision Tree Classifier [O] . Sushruta Mishra, Pradeep Kumar Mallick, Hrudaya Kumar Tripathy, 2020

机译：综合属性评估器和改进的决策树分类慢性疾病数据集提出的机器学习模型的性能评估

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance

摘要

著录项

相似文献

相关主题

期刊订阅