首页>
外国专利>
METHOD, DEVICE AND COMPUTER SOFTWARE PRODUCT FOR FLEXIBLE LANGUAGE IDENTIFICATION ON THE TEXT BASIS
METHOD, DEVICE AND COMPUTER SOFTWARE PRODUCT FOR FLEXIBLE LANGUAGE IDENTIFICATION ON THE TEXT BASIS
展开▼
机译:用于基于文本的灵活语言识别的方法,设备和计算机软件产品
展开▼
页面导航
摘要
著录项
相似文献
摘要
1. A method for determining a text-based language, including:! receiving a record in a computer-readable text format; ! determining an alphabetical index for this entry for each of a plurality of languages; ! determination of the frequency indicator n-grams of this record for each of the many languages; and! determining, by means of a processor, a language associated with the recording based on a combination of an alphabet index and a n-gram frequency index. ! 2. The method according to claim 1, in which the definition of an indicator of the alphabet includes comparing the characters associated with the record with the alphabet of each language from many languages and creating an indicator for each language from many languages, and this indicator for each language from many languages is based on at least least partially due to the absence of one or more characters in the corresponding alphabet of the corresponding language from the set of languages for which the indicator is determined. ! 3. The method according to claim 1 or 2, in which the determination of the indicator of the frequency of n-grams for each language from many languages includes comparing the record with statistics of n-grams for each of the many languages. ! 4. The method according to claim 3, in which the record includes n characters, and comparing the record with statistics of n-grams includes determining the conditional probability of occurrence of the nth character of the record, provided that there are previous n-1 characters. ! 5. The method according to claim 3, further comprising assigning a start character and an end character to the first and last characters of the record, respectively, for use in matching with the corresponding start and end characters associated with the probability of each n-gram in n-gram statistics. ! 6. The method according to claim 1, also comprising comparing indicator a
展开▼