首页> 外文会议>Asia Information Retrieval Societies Conference >PLIDMiner: A Quality Based Approach for Researcher's Homepage Discovery
【24h】

PLIDMiner: A Quality Based Approach for Researcher's Homepage Discovery

机译:Plidminer:研究人员首页发现的基于质量方法

获取原文

摘要

Researchers' high quality homepages are important resources in academic search because they provide comprehensive and up-to-date information about researchers. Meanwhile, low quality homepages widely exist. A case study shows that 57.8% of all homepages retrieved among top 10 results from Google are low quality and 95% top researchers own out-of-date homepages. Besides, some academic portals generate dynamic homepages introducing researchers. These homepages are not maintained by researchers and may contain incorrect information. The quality of discovered homepages can not be ensured by existing work, which decreases the efficiency of academic search. It is difficult to define a high quality homepage from a quantitative perspective. Instead, on the basis of analyzing labeled high quality homepages, we propose "informative researcher's homepage", at least consisting of identifiable information (introducing a researcher's basic information) and publication list (listing his/her corresponding publications), as an estimation for high quality homepage. Based on the observation that informative researchers' homepages are organized in two ways, integrated and scattered, we propose an effective discovering model, PLIDMiner, with F1 scores over 0.9 on labeled data. Our model can also be applied to verify homepages' quality. We crawl thousands of homepage resources from popular academic portals and assess their overall qualities. It turns out that nearly 25% of homepage resources in these portals are not informative, which strengthens our motivation.
机译:研究人员的高质量主页是学术搜索中的重要资源,因为它们提供了有关研究人员的全面和最新信息。与此同时,低质量的主页广泛存在。案例研究表明,谷歌前10个结果中检索的所有主页的57.8%是低质量,95%的顶级研究人员拥有过时的主页。此外,一些学术门户网站生成了介绍研究人员的动态主页。这些主页不受研究人员维护,并且可能包含不正确的信息。现有工作无法确保发现的主页的质量,这降低了学术搜索的效率。难以从定量的角度定义高质量的主页。相反,在分析标记为高质量的主页的基础上,我们提出了“信息化研究员的主页”,至少由可识别的信息(介绍研究人员的基本信息)和出版物列表(列出他/她的相应出版物),作为高高的估计优质的主页。基于观察,信息化研究人员的主页以两种方式组织,集成和分散,我们提出了一个有效的发现模型Plidminer,在标记数据上具有超过0.9的F1分数。我们的模型也可以应用于验证主页的质量。我们爬上了来自流行的学术门户网站的数千名主页资源,并评估其整体品质。事实证明,这些门户网站中的近25%的主页资源不是信息,这加强了我们的动机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号