...
【24h】

Multi-instance learning based web mining

机译:基于多实例学习的Web挖掘

获取原文
获取原文并翻译 | 示例

摘要

In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a web mining problem, i.e. web index recommendation, is investigated from a multi-instance view. In detail, each web index page is regarded as a bag while each of its linked pages is regarded as an instance. A user favoring an index page means that he or she is interested in at least one page linked by the index. Based on the browsing history of the user, recommendation could be provided for unseen index pages. An algorithm named Fretcit-kNN, which employs the Minimal Hausdorff distance between frequent term sets and utilizes both the references and citers of an unseen bag in determining its label, is proposed to solve the problem. Experiments show that in average the recommendation accuracy of Fretcit-kNN is 81.0% with 71.7% recall and 70.9% precision, which is significantly better than the best algorithm that does not consider the specific characteristics of multi-instance learning, whose performance is 76.3% accuracy with 63.4% recall and 66.1% precision.
机译:在多实例学习中,训练集包括由未标记实例组成的标记袋,任务是预测未看见袋的标记。本文从多实例的角度研究了Web挖掘问题,即Web索引推荐。详细地,每个Web索引页面都被视为一个包,而其每个链接页面都被视为一个实例。偏爱索引页面的用户表示他或她对至少一个由索引链接的页面感兴趣。基于用户的浏览历史,可以为看不见的索引页面提供推荐。提出了一种名为Fretcit-kNN的算法来解决该问题,该算法利用了频繁术语集之间的最小Hausdorff距离,并利用了一个看不见的袋子的参考文献和引用者来确定其标签。实验表明,Fretcit-kNN的平均推荐精度为81.0%,召回率为71.7%,精度为70.9%,明显优于不考虑多实例学习具体特征的最佳算法,后者的性能为76.3%准确率达63.4%,召回率和66.1%精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号