首页> 外文会议>ACM/IEEE-CS joint conference on Digital libraries >Finding authoritative people from the web
【24h】

Finding authoritative people from the web

机译:从网上寻找权威人士

获取原文

摘要

Today's web is so huge and diverse that it arguably reflects the real world. For this reason, searching the web is a promising approach to find things in the real world. This paper presents NEXAS, an extension to web search engines that attempts to find real-worldentities relevant to a topic. Its basic idea is to extract proper names from the web pages retrieved for the topic. A main advantage of this approach is that users can query any topic and learn about relevant real-world entities without dedicated databases for the topic. In particular, we focus on an application for finding authoritative people from the web. This application is practically important because once personal names are obtained, they can lead users from the web to managed information stored in digital libraries. To explore effective ways of finding people, we first examine the distribution of Japanese personal names by analyzing about 50 million Japanese web pages. We observe that personal names appear frequently on the web, but the distribution is highly influenced by automatically generated texts. To remedy the bias and find widely acknowledged people accurately, we utilize the number of web servers containing a name instead of the number of web pages. We show its effectiveness by an experiment covering a wide range oftopics. Finally, we demonstrate several examples and suggest possible applications.
机译:当今的网络是如此之大,种类繁多,可以说它反映了现实世界。因此,搜索网络是在现实世界中寻找事物的一种有前途的方法。本文介绍了NEXAS,这是Web搜索引擎的扩展,它试图找到与某个主题相关的真实世界。其基本思想是从针对该主题检索的网页中提取专有名称。这种方法的主要优点是,用户可以查询任何主题并了解相关的现实世界实体,而无需使用专门的数据库来存储该主题。特别是,我们专注于从网络上找到权威人士的应用程序。该应用程序实际上很重要,因为一旦获得了个人名称,它们就可以将用户从网络引导到数字图书馆中存储的托管信息。为了探索寻找人的有效方法,我们首先通过分析约5000万个日语网页来检查日语个人名字的分布。我们注意到,个人名称经常出现在网络上,但是自动生成的文本会极大地影响其分布。为了纠正这种偏见并准确找到广为人知的人员,我们利用包含名称的Web服务器数量而不是网页数量。我们通过涵盖广泛主题的实验来证明其有效性。最后,我们演示几个示例并建议可能的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号