...
首页> 外文期刊>Information Sciences: An International Journal >On the selection of the correct number of terms for profile construction: Theoretical and empirical analysis
【24h】

On the selection of the correct number of terms for profile construction: Theoretical and empirical analysis

机译:在选择正确概率的概况构建时:理论和实证分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

AbstractIn this paper, we examine the problem of building a user profile from a set of documents. This profile will consist of a subset of the most representative terms in the documents that best represent user preferences or interests. Inspired by the discrete concentration theory we have conducted an axiomatic study of seven properties that a selection function should fulfill: the minimum and maximum uncertainty principle, invariant to adding zeros, invariant to scale transformations, principle of nominal increase, transfer principle and the richest get richer inequality. We also present a novel selection function based on the use of similarity metrics, and more specifically the cosine measure which is commonly used in information retrieval, and demonstrate that this verifies six of the properties in addition to a weaker variant of the transfer principle, thereby representing a good selection approach. The theoretical study was complemented with an empirical study to compare the performance of different selection criteria (weight- and unweight-based) using real data in a parliamentary setting. In this study, we analyze the performance of the different functions focusing on the two main factors affecting the selection process: profile size (number of terms) and weight distribution. These profiles are then used in a document filtering task to show that our similarity-based approach performs well in terms not only of recommendation accuracy but also efficiency (we obtain smaller profiles and consequently faster recommendations).]]>
机译:<![cdata [ 抽象 在本文中,我们检查了从一组文档构建用户配置文件的问题。此配置文件将包括最能代表用户偏好或兴趣的文档中最具代表性术语的子集。由离散集中理论的启发,我们进行了一个分担的七种属性的公理研究,选择功能应该满足:最小和最大的不确定性原则,不变于添加零,不变于缩放转换,标称增加原则,转移原则和最富有的原则更丰富的不平等。我们还基于使用相似度量的新颖选择功能,更具体地,通常用于信息检索中的余弦测量,并且证明除了转移原理的较弱变体之外,这验证了六个属性,从而代表良好的选择方法。理论研究与实证研究辅以使用议会环境中使用真实数据比较不同选择标准(基于重量和单重)的性能。在这项研究中,我们分析了专注于影响选择过程的两个主要因素的不同功能的性能:配置文件大小(术语数)和重量分布。然后将这些配置文件用于文档过滤任务,以表明我们的相似性的方法不仅可以符合推荐准确性而且效率(我们获得更小的配置文件,并因此更快地推荐)。 < / ce:抽象-sec> ]]>

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号