首页> 外文期刊>International journal of parallel programming >A Text Clustering Approach of Chinese News Based on Neural Network Language Model
【24h】

A Text Clustering Approach of Chinese News Based on Neural Network Language Model

机译:基于神经网络语言模型的中文新闻文本聚类方法

获取原文
获取原文并翻译 | 示例

摘要

Text clustering plays an important role in data mining and machine learning. After years of development, clustering technology has produced a series of theories and methods. However, in the text clustering of Chinese news, the mainstream LDA method suffers a high time complex. In order to improve the speed, this paper puts forward a new method in which neural network language model is first applied to text clustering. Text clustering is first converted to its dual problem called word clustering. With neural network language model, we can get word vector which can be used in the fuzzy k-means of the Chinese news keyword set. Based on the keyword clustering result, we can get text clustering result of Chinese news by a single transition. Experiments have show this method's running speed is five times faster than LDA. This method has been successfully used in the Sohu news recommendation system currently.
机译:文本聚类在数据挖掘和机器学习中起着重要作用。经过多年的发展,集群技术已经产生了一系列的理论和方法。但是,在中文新闻的文本聚类中,主流的LDA方法具有较高的时间复杂度。为了提高速度,本文提出了一种将神经网络语言模型首先应用于文本聚类的新方法。文本聚类首先转换为其双重问题,即单词聚类。利用神经网络语言模型,我们可以得到可以在中文新闻关键词集的模糊k-均值中使用的词向量。基于关键词聚类结果,我们可以通过一次转换就获得中文新闻的文本聚类结果。实验表明,该方法的运行速度是LDA的五倍。该方法目前已在搜狐新闻推荐系统中成功使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号