首页>
外国专利>
INDEXING AND SEARCHING IDEOGRAPHIC CHARACTERS ON A NETWORKED SYSTEM OF COMPUTERS
INDEXING AND SEARCHING IDEOGRAPHIC CHARACTERS ON A NETWORKED SYSTEM OF COMPUTERS
展开▼
机译:在网络化的计算机系统上索引和搜索象形文字
展开▼
页面导航
摘要
著录项
相似文献
摘要
The system and method allows the retrieval, indexing and searching ofinformation stored on computers connected by a communications network, wherethat information comprises ideographic, logographic or pictographiccharacters, which are encoded using two bytes per character. The binary valuewhich encodes a particular character contained in a given document isconverted into hexadecimal text format, which is then prefixed with apredetermined marker character to indicate that it is the hexadecimal value ofa double-byte character. That value is then added to a sequential string ofsuch values for each of such characters in that document. The markercharacters are then removed from this string, leaving a series of alphanumericcharacters separated at set intervals by blank spaces. Each set of charactersdemarcated by a blank space is then indexed as if it were a standard word suchas an English word, albeit a meaningless one. A unique index entry is createdfor each such word and phase (up to a predetermined combination of such words)which the search engine encounters, and incorporates positional data whichpoints to the location on a networked system of computers of each occurrenceof that particular word or phase which the search engine has encountered.Search queries are then met by retrieving the positional data associated witheach character or sequence of characters contained in the search query todetermine whether any occurrence of those characters which has beenencountered by the search engine meets the criteria of the user.
展开▼