首页> 外国专利> Identifying cultural background from text

Identifying cultural background from text

机译：从文本中识别文化背景

页面导航

摘要
著录项
相似文献

摘要

Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. N-grams can be constructed from the tokenized text, each n-gram including one or more of consecutive tokens from the tokenized text. The n-grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture.

机译：可以通过根据规则集对文本的单词进行标记以生成标记化的文本来确定或分析文本的混音，该规则集定义：第一组语法类型的单词，这些单词是用分别表示单词的标记替换的单词各个单词的语法类型，以及第二组语法类型的单词，它们是作为令牌传递而没有更改的单词。可以从标记化文本构造N-gram，每个n-gram包括来自标记化文本的一个或多个连续标记。可以将n-gram与对应于已知透析的训练数据集进行比较，以获得比较结果，该比较结果指示文本与已知透析的训练数据集的匹配程度。

著录项

公开/公告号EP2645272A1

专利类型
公开/公告日2013-10-02

原文格式PDF
申请/专利权人 LOCKHEED MARTIN CORPORATION;
展开▼

申请/专利号EP20130161708
发明设计人 TAYLOR SARAH M.;DAVENPORT DANIEL;MENAKER DAVID M.;PARADIS ROSEMARY D.;
展开▼

申请日2013-03-28
分类号G06F17/27;
国家 EP
入库时间 2022-08-21 16:28:47

相似文献

专利
外文文献
中文文献