【24h】

Semantic Search

机译:语义搜索

获取原文

摘要

Semantic search lies in the cross roads of information retrieval and natural language processing and is the current frontier of search technology. The first part consist in building a semanucally annotated index with the help of a knowledge base. For this we first need to predict the language of each document and parse it accordingly to that language. Second, we need to extract all entities and concepts mentioned in the document with the help of the knowledge base. All the knowledge base infrastructure needs to be independent of the language and we instantiate each language in the lexicon of the knowledge base. The second part is predicting the intention behind the query, which implies doing semantic query understanding. This process implies the same semantic processing as document. After, based on all this information, we have to predict one or more possible intentions with a certain probability, which is particularly important for ambiguous queries. These scores will be one of the inputs for the final semantic ranking. For example, given the query "bond", possible results for query understanding are a financial instrument, the movie character, a chemical reaction, or a term for endearment. Semantic ranking refers to ranking search results using semantic information. In a standard search engine, a rank is computed by using signals or features coming from the search query, from the documents in the collection being searched and from the search context, such as the language and device being used. In our case we add semantic relations between the entities and concepts found in the query was the same objects in the documents, that will come from different data sources. For this we use machine learning in several stages. The first stage selects the data sources that we should use to answer the query. In the second stage, each data source generates a set of answers using "earning to rank." The third and final stage ranks these data sources, selecting and ordering the intentions as well as the answers inside each intention (e.g., news) that will appear in the final composite answer. All these stages are language independent, but may use language dependent features. We will cover the process above having in mind a services-based approach, including the data science needed to use as relevance feedback the usage log stream of the semantic search engine.
机译:语义搜索位于信息检索和自然语言处理的交叉路口,并且是搜索技术的当前前沿。第一部分包括借助知识库构建语义标注的索引。为此,我们首先需要预测每个文档的语言并将其相应地解析为该语言。其次,我们需要借助知识库来提取文档中提到的所有实体和概念。所有知识库基础结构都需要独立于语言,并且我们在知识库的词典中实例化每种语言。第二部分是预测查询背后的意图,这意味着要进行语义查询理解。此过程意味着与文档相同的语义处理。之后,基于所有这些信息,我们必须以一定的概率预测一个或多个可能的意图,这对于模棱两可的查询尤为重要。这些分数将是最终语义排名的输入之一。例如,给定查询“ bond”,用于查询理解的可能结果是金融工具,电影角色,化学反应或喜爱术语。语义排名是指使用语义信息对搜索结果进行排名。在标准搜索引擎中,使用来自搜索查询,来自正在搜索的集合中的文档以及来自搜索上下文(例如所使用的语言和设备)的信号或特征来计算等级。在我们的案例中,我们在查询中发现的实体和概念之间添加了语义关系,这些语义关系是文档中的相同对象,它们将来自不同的数据源。为此,我们在多个阶段使用机器学习。第一阶段选择我们应该用来回答查询的数据源。在第二阶段,每个数据源都使用“收入排名”来生成一组答案。第三阶段也是最后阶段,对这些数据源进行排序,选择意图并对其进行排序,并将每个意图(例如新闻)中的答案显示在最终的综合答案中。所有这些阶段都是独立于语言的,但是可以使用依赖于语言的功能。我们将牢记基于服务的方法来介绍上述过程,包括将语义搜索引擎的使用日志流用作相关反馈所需的数据科学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号