首页> 外国专利> Method and system for enhanced data searching by parsing data into syntactic units

Method and system for enhanced data searching by parsing data into syntactic units

机译:通过将数据解析为语法单元来增强数据搜索的方法和系统

摘要

A syntactic query engine for transforming at least one sentence of a document or query into a canonical representation using entity tags comprises a memory medium containing a parser that is configured to: receive a designation of a plurality of entity tags and decompose the at least one sentence to generate a parse structure for the sentence having a plurality of syntactic elements that correspond to a part of speech determine from the structure of the parse structure a set of meaningful terms that correspond to one or more of the designated entity tags and for each of one or more of the meaningful terms, store, in an enhanced data representation data structure. The representation includes the term and the corresponding entity tag type, such that the at least one sentence is represented in the data structure by at least one entity tag. Each entity tag has a type and a value and the type of each entity indicates a possible attribute of a sentence that foes not represent a part of speech and does not represent a grammatical role. A query engine for is searching a corpus of documents containing a parser and a postprocessor is disclosed. Each document has a plurality of sentences, and the corpus having an index of the plurality of sentences for the documents. The parser is structured to receive an indication of a plurality of consecutive sentences; and decompose the indicated plurality of consecutive sentences to generate a plurality of search terms for searching the document corpus. The postprocessor is structured to determine a plurality of result sentences in the corpus that correlate to the search terms using latent semantic regression techniques to determine the similarity of the search terms to the sentences in the corpus of documents; and return indications of the determined result sentences.
机译:用于使用实体标签将文档或查询的至少一个句子转换为规范表示的句法查询引擎包括包含解析器的存储介质,该解析器配置为:接收多个实体标签的指定并分解至少一个句子为具有对应于词性的多个句法元素的句子生成语法分析结构,从语法分析结构的结构中确定一组有意义的术语,这些术语对应于一个或多个指定的实体标签,并且针对每个术语一个或多个有意义的术语存储在增强的数据表示数据结构中。该表示包括术语和相应的实体标签类型,以使得至少一个句子在数据结构中由至少一个实体标签表示。每个实体标签具有一个类型和一个值,并且每个实体的类型指示一个句子的可能属性,其不代表语音的一部分并且不代表语法的作用。公开了一种用于搜索包含解析器和后处理器的文档语料库的查询引擎。每个文档具有多个句子,并且语料库具有文档的多个句子的索引。解析器被构造成接收多个连续句子的指示;并分解所指示的多个连续句子,以生成多个用于搜索文档语料库的搜索词。后处理器被构造为使用潜在语义回归技术来确定语料库中与搜索词相关的多个结果句子,以确定搜索词与文档语料库中的句子的相似性;并返回所确定结果语句的指示。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号