首页> 外文期刊>Information Systems >Modeling user interests by conceptual clustering
【24h】

Modeling user interests by conceptual clustering

机译:通过概念聚类为用户兴趣建模

获取原文
获取原文并翻译 | 示例
       

摘要

As more information becomes available on the Web, there has been a crescent interest in effective personalization techniques. Personal agents providing assistance based on the content of Web documents and the user interests emerged as a viable alternative to this problem. Provided that these agents rely on having knowledge about users contained into user profiles, i.e., models of user preferences and interests gathered by observation of user behavior, the capacity of acquiring and modeling user interest categories has become a critical component in personal agent design. User profiles have to summarize categories corresponding to diverse user information interests at different levels of abstraction in order to allow agents to decide on the relevance of new pieces of information. In accomplishing this goal, document clustering offers the advantage that an a priori knowledge of categories is not needed, therefore the categorization is completely unsupervised. In this paper we present a document clustering algorithm, named WebDCC (Web Document Conceptual Clustering), that carries out incremental, unsupervised concept learning over Web documents in order to acquire user profiles. Unlike most user profiling approaches, this algorithm offers comprehensible clustering solutions that can be easily interpreted and explored by both users and other agents. By extracting semantics from Web pages, this algorithm also produces intermediate results that can be finally integrated in a machine-understandable format such as an ontology. Empirical results of using this algorithm in the context of an intelligent Web search agent proved it can reach high levels of accuracy in suggesting Web pages.
机译:随着更多信息在Web上可用,人们对有效的个性化技术有了新的兴趣。提供基于Web文档内容和用户兴趣的帮助的个人代理已成为解决此问题的可行选择。假设这些代理依赖于对用户简档中所包含的用户的了解,即通过观察用户行为而收集的用户偏好和兴趣的模型,则获取和建模用户兴趣类别的能力已成为个人代理设计中的关键组成部分。用户配置文件必须汇总与不同抽象级别上的不同用户信息兴趣相对应的类别,以便允许代理决定新信息的相关性。在实现此目标时,文档聚类提供了以下优点:不需要类别的先验知识,因此,分类是完全不受监督的。在本文中,我们提出了一种名为WebDCC(Web文档概念聚类)的文档聚类算法,该算法对Web文档进行增量,无监督的概念学习,以获取用户配置文件。与大多数用户配置文件方法不同,此算法提供了易于理解的聚类解决方案,用户和其他代理均可轻松解释和探索。通过从网页中提取语义,该算法还产生中间结果,这些结果最终可以以机器可理解的格式(例如本体)进行集成。在智能Web搜索代理的上下文中使用该算法的经验结果证明,该算法可以在建议Web页面方面达到很高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号