首页>
外国专利>
Identifying cultural background from text
Identifying cultural background from text
展开▼
机译:从文本中识别文化背景
展开▼
页面导航
摘要
著录项
相似文献
摘要
Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. N-grams can be constructed from the tokenized text, each n-gram including one or more of consecutive tokens from the tokenized text. The n-grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.
展开▼