首页> 外国专利> Full-text search apparatus utilizing two-stage index file to achieve high speed and reliability of searching a text which is a continuous sequence of characters

Full-text search apparatus utilizing two-stage index file to achieve high speed and reliability of searching a text which is a continuous sequence of characters

机译:利用二级索引文件的全文搜索装置,实现了高速,可靠地搜索连续字符序列的文本

摘要

A new type of text search apparatus, capable of finding all occurrence positions of a search string that is an arbitrary character string, within a text which is written as a continous sequence of characters, utilizes for text position reference purposes in an index file, words which each occur (at least once within the text) as the maximum length word, referred to as an extension word, among a set of arbitrarily predefined dictionary words extending from a specific character position. Each such occurrence of a word as an extension word defines one of a set of text position elements, with that set covering all of the character positions of the text. The index file also includes a table which relates each of the extension words to the respective positions at which each of the partial character strings of the word occur within the word. Each occurrence of an arbitrary search string within the text can thereby be expressed as either a partial character string within a single text position element, or as a sequence of partial character strings within a set of sequentially occurring text position elements, so that all such occurrences can be found by utilizing the index file.
机译:一种新型的文本搜索设备,能够在以连续字符序列书写的文本中找到作为任意字符串的搜索字符串的所有出现位置,该文​​本搜索设备在索引文件,单词中用于文本位置参考在一组从特定字符位置延伸的任意预定义词典词中,每个词都出现(在文本中至少一次)作为最大长度的词,称为扩展词。单词作为扩展单词的每次出现都定义了一组文本位置元素中的一个,其中那个覆盖了文本的所有字符位置。索引文件还包括一个表,该表将每个扩展词与该词的每个部分字符串在该词内出现的相应位置相关。因此,文本中任意搜索字符串的每次出现都可以表示为单个文本位置元素内的部分字符串,也可以表示为一组顺序出现的文本位置元素内的部分字符串序列,以便所有此类出现可以通过使用索引文件找到。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号