【24h】

Entity Information Management in Complex Networks

机译:复杂网络中的实体信息管理

获取原文

摘要

Entity information management (EIM) deals with organizing, processing and delivering information about entities. Its emergence is a result of satisfying more sophisticated information needs that go beyond document search. In the recent years, entity retrieval has attracted much attention in the IR community. INEX has started the XML Entity Ranking track since 2007 and TREC has launched the Entity track since 2009 to investigate the problem of related entity finding. Some EIM problems go beyond retrieval and ranking such as: 1) entity profiling, which is about characterizing a specific entity, and 2) entity distillation, which is about discovering the trend about an entity. These problems have received less attention while they have many important applications. On the other hand, the entities in the real world or in the Web environment are usually not isolated. They are connected or related with each other in one way or another. For example, the coauthorship makes the authors with similar research interests be connected. The emergence of social media such as Facebook, Twitter and Youtube has further interweaved the related entities in a much larger scale. Millions of users in these sites can become friends, fans or followers of others, or taggers or commenters of different types of entities (e.g., bookmarks, photos and videos). These networks are complex in the sense that they are heterogeneous with multiple types of entities and of interactions, they are large-scale, they are multi-lingual, and they are dynamic. These features of the complex networks go beyond traditional social network analysis and require further research. In this proposed research, I investigate entity information management in the environment of complex networks. The main research question is: how can the EIM tasks be facilitated by modeling the content and structure of complex networks? The research is in the intersection of content based information retrieval and complex network analysis, which deals with both unstructured text data and structured networks. The specific targeting EIM tasks are entity retrieval, entity profiling and entity distillation. In addition to the main research question, the following questions are considered: How can we accomplish a EIM task involving diverse entity and interaction types? How to model the evolution of entity profiles as well as the underlying complex networks? How can the existing cross-language IR work be leveraged to build entity profiles with multi-lingual evidence? I propose to use probabilistic models and discriminative models in particular to address the above research questions. In my research, I have developed discriminative models for expert search to integrate arbitrary document features [3] and to learn flexible combination strategies to rank experts in heterogeneous information sources [1]. Discriminative graphical models are proposed to jointly discover homepages by inference on the homepage dependence network [2]. The dependence of table elements is exploited to collectively perform the entity retrieval task [4]. These works have shown the power of discriminative models for entity search and the benefits of utilizing the dependencies among related entities. What I would like to do next is to develop a unified probabilistic framework to investigate the research questions raised in this proposal.
机译:实体信息管理(EIM)处理组织,处理和提供有关实体的信息。它的出现是满足更复杂的信息需求,超越文档搜索。在近年来,实体检索在IR社区中引起了很多关注。自2007年以来,Inex已启动XML实体排名跟踪,自2009年以来启动了实体跟踪,调查了相关实体发现的问题。一些EIM问题超出了检索和排名,如:1)实体分析,即关于特定实体的特征和2)实体蒸馏,这是关于发现实体的趋势。在他们有许多重要的应用程序时,这些问题的注意力较少。另一方面,通常没有孤立现实世界或网络环境中的实体。它们以某种方式连接或彼此相关。例如,共同创作使作者与相似的研究兴趣进行连接。 Facebook,Twitter和YouTube等社交媒体的出现进一步以更大的规模交织了相关的实体。这些网站中的数百万用户可以成为其他人的朋友,粉丝或追随者,或不同类型实体的标签或评论者(例如,书签,照片和视频)。这些网络是复杂的意义上,它们是具有多种类型的实体和交互的异构,它们是大规模的,它们是多语言的,它们是动态的。这些复杂网络的功能超出了传统的社交网络分析,需要进一步研究。在这一提议的研究中,我调查了复杂网络环境中的实体信息管理。主要研究问题是:如何通过对复杂网络的内容和结构进行建模来促进EIM任务?该研究是基于内容的信息检索和复杂网络分析,涉及非结构化文本数据和结构网络。特定的目标EIM任务是实体检索,实体分析和实体蒸馏。除了主要的研究问题外,还有以下问题:我们如何完成涉及不同实体和交互类型的EIM任务?如何模拟实体配置文件的演变以及底层复杂网络?如何利用现有的跨语号红外工作来构建具有多语言证据的实体配置文件?我建议使用概率模型和歧视模型来解决上述研究问题。在我的研究中,我已经开发了专家搜索的判别模型,以集成任意文档功能[3]并学习对异构信息来源的校准专家进行灵活的组合策略[1]。提出了在主页依赖网络上推断联合发现主页的判别图形模型[2]。利用表元素的依赖性来集体执行实体检索任务[4]。这些作品已经示出了实体搜索的判别模型的力量以及利用相关实体之间的依赖性的好处。我想做的是开发一个统一的概率框架来调查在此提案中提出的研究问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号