首页>
外国专利>
Identifying cultural background from text
Identifying cultural background from text
展开▼
机译:从文本中识别文化背景
展开▼
页面导航
摘要
著录项
相似文献
摘要
Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. Grams can be constructed from the tokenized text, each gram including one or more of consecutive tokens from the tokenized text. The grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.
展开▼