首页> 外文期刊>Expert Systems with Application >Document clustering method using dimension reduction and support vector clustering to overcome sparseness
【24h】

Document clustering method using dimension reduction and support vector clustering to overcome sparseness

机译:利用降维和支持向量聚类克服稀疏性的文档聚类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.
机译:关于开发技术的许多研究已经以文章,论文或专利的形式发表。我们使用和分析这些文件来发现科学技术趋势。在本文中,我们将文档聚类视为一种文档数据分析方法。通常,我们很难直接分析文档,因为文档数据不适用于统计和机器学习分析方法。因此,出于分析目的,我们必须将文档数据转换为结构化数据。在此过程中,我们使用文本挖掘技术。结构化数据非常稀疏,因此很难对其进行分析。这项研究提出了一种新的方法来克服文档聚类的稀疏性问题。我们基于支持向量聚类和Silhouette度量,使用降维和K-均值聚类构建了一种组合聚类方法。特别地,我们试图克服专利文献聚类中的稀疏性。为了验证我们工作的有效性,我们首先使用来自加州大学欧文分校的机器学习存储库中的新闻数据进行了一项实验。第二,我们使用从美国专利商标局获得的专利文件,对技术预测进行专利聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号