首页> 外国专利> Method and system for indexing and searching contents of extensible markup language (XML) documents

Method and system for indexing and searching contents of extensible markup language (XML) documents

机译:用于索引和搜索可扩展标记语言文档内容的方法和系统

摘要

A method and a computer system for indexing and searching the data content of nested field records, such as those in Extensible Markup Language (XML). The system includes an indexing and searching engine that constructs an improved full-text search index on the input XML data and then performs searches using the index. The system supports exact matches and partial matches using a wildcard character. The method transforms the input XML data into a form that encodes the data structural information by suffixing each word with its corresponding field qualifiers or an equivalent numerical pattern thereof. The resulting encoded words are then stored in a full-text index structure. Various types of full-index search may be performed. One alternative embodiment is to combine string matching and numeric or integer pattern matching to identify a particular word in a particular field. The portion of the word without field qualifiers is matched against the words in the index, and the pattern of numerals representing the word's field qualifiers is matched against the numeral patterns of the words in the index that correspond to their respective field qualifiers. Therefore, evaluation of complex field criteria is reduced to simpler and faster numeric matching.
机译:一种用于索引和搜索诸如可扩展标记语言(XML)中的嵌套字段记录的数据内容的方法和计算机系统。该系统包括一个索引和搜索引擎,该引擎在输入的XML数据上构建改进的全文本搜索索引,然后使用该索引执行搜索。系统使用通配符支持完全匹配和部分匹配。该方法将输入的XML数据转换为一种形式,该形式是通过在每个单词后加上相应的字段限定符或其等效数字模式来对数据结构信息进行编码。然后将生成的编码词存储在全文索引结构中。可以执行各种类型的全索引搜索。一个替代实施例是将字符串匹配与数字或整数模式匹配相结合以标识特定字段中的特定单词。没有字段限定词的单词部分与索引中的单词匹配,代表单词的字段限定词的数字模式与对应于其各自字段限定词的索引中的单词数字模式匹配。因此,复杂字段标准的评估减少为更简单,更快速的数字匹配。

著录项

  • 公开/公告号US2005004935A1

    专利类型

  • 公开/公告日2005-01-06

    原文格式PDF

  • 申请/专利权人 DAVID VICTOR THEDE;

    申请/专利号US20040902144

  • 发明设计人 DAVID VICTOR THEDE;

    申请日2004-07-30

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 22:19:42

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号