首页>
外国专利>
Hybrid comparison for unicode text strings consisting primarily of ASCII characters
Hybrid comparison for unicode text strings consisting primarily of ASCII characters
展开▼
机译:主要由ASCII字符组成的unicode文本字符串的混合比较
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method compares text strings having Unicode encoding. The method receives a first string S=s1 s2 . . . sn and a second string T=t1 t2 . . . tm, where s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters. The method computes a first string weight for the first string S according to a weight function ƒ. When S consists of ASCII characters, ƒ(S)=S. When S consists of ASCII characters and some accented ASCII characters that are replaceable by ASCII characters, ƒ(S)=g(s1) g(s2) . . . g(sn), where g(si)=si when si is an ASCII character and g(si)=si′ when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′. When S includes one or more non-replaceable non-ASCII characters, the first string weight concatenates an ASCII weight prefix ƒA (S) and a Unicode weight suffix ƒU(S). The method also computes a second string weight for the second text string T. Equality of the strings is tested using the string weights.
展开▼
机译:一种方法比较具有Unicode编码的文本字符串。该方法接收第一个字符串S = s 1 Sub> s 2 Sub>。 。 。 s n Sub>和第二个字符串T = t 1 Sub> t 2 Sub>。 。 。 t m Sub>,其中s 1 Sub>,s 2 Sub>,。 。 。 ,s n Sub>和t 1 Sub>,t 2 Sub>,。 。 。 ,t m Sub>是Unicode字符。该方法根据权重函数ƒ计算第一弦S的第一弦权重。当S由ASCII字符组成时,ƒ(S)= S。当S由ASCII字符和一些可替换为ASCII字符的重音ASCII字符组成时,ƒ(S)= g(s 1 Sub>)g(s 2 Sub>)。 。 。 g(s n Sub>),其中,当s i Sub>是ASCII字符时,g(s i Sub>)= s i Sub>和g(s i Sub>)= s i Sub>',当s i Sub>是带重音的ASCII字符时,可以用相应的ASCII字符s i Sub>'。当S包含一个或多个不可替换的非ASCII字符时,第一个字符串权重将ASCII权重前缀ƒ A Sub>(S)和Unicode权重后缀ƒ U Sub>( S)。该方法还计算第二文本字符串T的第二字符串权重。使用字符串权重来测试字符串的相等性。
展开▼