首页> 外国专利> Automatic Genre Classification Determination of Web Content to which the Web Content Belongs Together with a Corresponding Genre Probability

Automatic Genre Classification Determination of Web Content to which the Web Content Belongs Together with a Corresponding Genre Probability

机译:Web内容所属的Web内容的自动体裁分类确定以及相应的体裁概率

摘要

A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.
机译:提供了一种用于自动确定网络内容的类型的机制。对于每种类型的Web内容类型,从收集的培训材料中提取一组相关的特征类型,其中,类型特征和非类型特征由Token表示,并且整数计数表示两种形式的令牌出现的频率培训材料和第二种培训材料。在分类过程中,从网络内容的不同文本和结构元素中为相关功能类型提取固定长度的标记。对于每种相关的特征类型,计算相应的特征概率。将特征概率与Web内容属于特定受训Web内容流派的总体流派概率进行组合。然后输出一种类型分类结果,该类型分类结果包括该网络内容所属的至少一个特定的经过训练的网络内容类型以及相应的类型概率。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号