首页> 外国专利> METHOD AND DEVICE FOR EXTRACTING CORE WORDS FROM COMMODITY SHORT TEXT

METHOD AND DEVICE FOR EXTRACTING CORE WORDS FROM COMMODITY SHORT TEXT

机译:从商品短文本中提取核心词的方法和装置

摘要

The present invention discloses a method and a device for extracting core words from commodity short text, which relates to the field of big data processing. Wherein the method for extracting core words from commodity short text comprises: acquiring the commodity short text in a data set; performing word segmentation processing for the commodity short text; obtaining a document vector of the commodity short text according to context information of the word segmentation of the commodity short text; clustering the commodity short text in the data set on the basis of the document vector; determining the clustering hierarchy weight of each word segment in the commodity short text in a category to which the commodity short text belongs; and determining core words of the commodity short text according to the clustering hierarchy weight of each word segment. The present invention refers to the context information of the word segment in the commodity short text to obtain the document vector of the commodity short text, which can make up the shortcoming of small information amount of the short text, make the clustering result based on the document vector more accurate, and thereby extract core words more accurately from the commodity short text according to the weight of the word segment in the clustering category to which the commodity short text thereof belongs.
机译:本发明公开了一种从商品短文本中提取核心词的方法和装置,涉及大数据处理领域。其中,从商品短文本中提取核心词的方法包括:获取数据集中的商品短文本;对商品短文本进行分词处理;根据商品短文本的分词的上下文信息,获取商品短文本的文档矢量;根据文档向量将商品短文本聚类在数据集中;确定商品短文本所属类别中商品短文本中每个词段的聚类层次权重;根据每个词段的聚类层次权重确定商品短文本的核心词。本发明通过参考商品短文本中的词段的上下文信息来获得商品短文本的文档向量,可以弥补短文本信息量少的缺点,使基于该文本的聚类结果成为可能。文档矢量更准确,从而根据商品短文本所属的聚类类别中词段的权重,从商品短文本中更准确地提取核心单词。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号