首页>
外国专利>
METHOD FOR FOCUSLY COMPRESSING KOREAN CHARACTER THROUGH 3-5 BIT COMPRESSION FOR BYTE 1, METHOD FOR FOCUSLY COMPRESSING 2-4 BITS FOR BUTE 2, AND DEVICE THEREOF, IN UTF-8 CODE CHARACTER SYSTEM
METHOD FOR FOCUSLY COMPRESSING KOREAN CHARACTER THROUGH 3-5 BIT COMPRESSION FOR BYTE 1, METHOD FOR FOCUSLY COMPRESSING 2-4 BITS FOR BUTE 2, AND DEVICE THEREOF, IN UTF-8 CODE CHARACTER SYSTEM
In case of social network service (SNS) in Korea, a range of Unicode containing Korean is from U+AC00 to U+D7AF, and a byte header of UTF-8 is ″1110″. Since Korean language frequently appears in social media based on Korean language, in the present invention a shorter compression header bit is mapped with ″10″. In this case, there is no benefit for other characters in which a header of BYTE 1 does not appear in a high frequency (of course, there is no loss), but for an area containing Korean characters, the present invention substitutes a header bit with ″10″ in a first top byte to acquire a gain of 2 bits, then compresses additional 1-3 bits in a process of combining remaining 4 bits of byte 1, thereby acquiring gains of overall 3-5 bits. Of course, for a byte 2, the present invention is designed to acquire a gain of 2-4 bits and for byte 3-6 to acquire by 2 bits without fail. In addition, in case of English characters starting with 0, the present invention additionally compresses 1 bit in a blank character to increase a compression efficiency in UTF-8 of Korean characters, which activates a spacing in documents based on Korean characters.;COPYRIGHT KIPO 2018
展开▼