首页> 外国专利> INDEXING AND SEARCHING IDEOGRAPHIC CHARACTERS ON A NETWORKED SYSTEM OF COMPUTERS

INDEXING AND SEARCHING IDEOGRAPHIC CHARACTERS ON A NETWORKED SYSTEM OF COMPUTERS

机译:在网络化的计算机系统上索引和搜索象形文字

摘要

The system and method allows the retrieval, indexing and searching ofinformation stored on computers connected by a communications network, wherethat information comprises ideographic, logographic or pictographiccharacters, which are encoded using two bytes per character. The binary valuewhich encodes a particular character contained in a given document isconverted into hexadecimal text format, which is then prefixed with apredetermined marker character to indicate that it is the hexadecimal value ofa double-byte character. That value is then added to a sequential string ofsuch values for each of such characters in that document. The markercharacters are then removed from this string, leaving a series of alphanumericcharacters separated at set intervals by blank spaces. Each set of charactersdemarcated by a blank space is then indexed as if it were a standard word suchas an English word, albeit a meaningless one. A unique index entry is createdfor each such word and phase (up to a predetermined combination of such words)which the search engine encounters, and incorporates positional data whichpoints to the location on a networked system of computers of each occurrenceof that particular word or phase which the search engine has encountered.Search queries are then met by retrieving the positional data associated witheach character or sequence of characters contained in the search query todetermine whether any occurrence of those characters which has beenencountered by the search engine meets the criteria of the user.
机译:该系统和方法允许检索,索引和搜索存储在通过通信网络连接的计算机上的信息,其中该信息包括表意文字,文字文字或象形文字字符,每个字符使用两个字节进行编码。二进制值编码给定文档中包含的特定字符的是转换为十六进制文本格式,然后以预定的标记字符,以指示它是的十六进制值一个双字节字符。然后将该值添加到该文档中每个此类字符的此类值。标记然后从该字符串中删除字符,剩下一系列字母数字字符以设置的间隔由空格分隔。每组字符然后将由空格分隔的索引编入索引,就像它是标准单词一样,例如作为一个英语单词,尽管毫无意义。创建唯一索引条目对于每个这样的单词和相位(直到这些单词的预定组合)搜索引擎会遇到的位置,并合并位置数据指向每次出现在计算机网络系统上的位置搜索引擎遇到的那个特定单词或阶段。然后,通过检索与之相关的位置数据来满足搜索查询搜索查询中包含的每个字符或字符序列确定是否已经出现过这些字符搜索引擎遇到的问题符合用户的标准。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号