...
首页> 外文期刊>Turkish Journal of Electrical Engineering and Computer Sciences >Key word extraction for short text via word2vec, doc2vec, and textrank
【24h】

Key word extraction for short text via word2vec, doc2vec, and textrank

机译:通过Word2VEC,DOC2VEC和Textrank进行短文本的关键词提取

获取原文
           

摘要

Day by day huge amounts data are produced, and evaluation of these data becomes more difficult. The data obtained should provide meaningful, correct, and accurate information. Therefore, all data must be separated into clusters correctly, and the right information from these clusters must be obtained. Having the correct clusters depends on the clustering algorithm that is used. There are many clustering algorithms. The density-based methods are very important among the groups of clustering methods, as they can find arbitrary shapes. An advanced model of the density-based spatial clustering of applications with noise (DBSCAN) algorithm, called fuzzy neighborhood DBSCAN Gaussian means (FN-DBSCAN-GM), is offered in this study. The main contribution of FN-DBSCAN-GM is to find the parameters automatically and to divide the data into clusters robustly. The effectiveness of FN-DBSCAN-GM has been demonstrated on overlapping datasets (six artificial and two real-life datasets). The performances of these datasets are compared with the percentage of correct classification and validity index. Our experiments showed that this new algorithm was a preferable and robust algorithm.
机译:日益巨大的数量数据是生产的,并且对这些数据的评估变得更加困难。获得的数据应提供有意义的,正确和准确的信息。因此,所有数据必须正确分离为群集,必须获得来自这些集群的正确信息。具有正确的群集取决于使用的聚类算法。有许多聚类算法。基于密度的方法在聚类方法中非常重要,因为它们可以找到任意形状。本研究提供了一种具有噪声(DBSCAN)算法的基于密度的空间聚类的高级模型,称为模糊邻域DBSCAN Gaussian手段(FN-DBSCAN-GM)。 FN-DBSCAN-GM的主要贡献是自动查找参数并将数据划分为群集鲁棒化。已经在重叠的数据集(六个人工和两个实际数据集)上证明了FN-DBSCAN-GM的有效性。将这些数据集的性能与正确分类和有效性指数的百分比进行比较。我们的实验表明,这种新算法是一种优选且稳健的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号