首页> 外国专利> CROSSLINGUAL TEXT CLASSIFICATION METHOD USING EXPECTED FREQUENCIES

CROSSLINGUAL TEXT CLASSIFICATION METHOD USING EXPECTED FREQUENCIES

机译:基于期望频率的跨语言文本分类方法

摘要

A method that, given a bag-of-words representation of a text snippet written in a source language, calculates an expected bag-of- words representation in a target language, includes: a step in which, for a source word in the input bag-of-words, a probability that the source word is translated into a target word is calculated by using given probabilities that the target word is translated into the source word and by using co-occurrence probabilities of two or more target words that are calculated from a corpus written in the target language; and a step in which the probability that the target word is a translation of the source word is summed up to denote an expected count of the target word, and to create a feature vector by using the expected counts; the resulting feature vector in the target language being considered as the expected bag-of-words representation that represents the input bag-of-words.
机译:给定以源语言编写的文本片段的单词袋表示形式,以目标语言计算预期的单词袋表示形式的方法,该方法包括:在输入中针对源单词的步骤单词袋,通过使用给定的将目标单词翻译为源单词的概率以及通过使用两个或多个计算出的目标单词的共现概率来计算将源单词翻译为目标单词的概率来自以目标语言编写的语料库;步骤,将目标词为源词翻译的概率相加,以表示目标词的期望计数,并通过使用期望计数来创建特征向量;目标语言中的结果特征向量将被视为代表输入单词袋的预期单词袋表示形式。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号