首页> 外文期刊>Complexity >Using the Ship-Gram Model for Japanese Keyword Extraction Based on News Reports
【24h】

Using the Ship-Gram Model for Japanese Keyword Extraction Based on News Reports

机译:基于新闻报道,使用船舶克模型进行日语关键字提取

获取原文
           

摘要

In this paper, we conduct an in-depth study of Japanese keyword extraction from news reports, train external computer document word sets from text preprocessing into word vectors using the Ship-gram model in the deep learning tool Word2Vec, and calculate the cosine distance between word vectors. In this paper, the sliding window in TextRank is designed to connect internal document information to improve the in-text semantic coherence. The main idea is to use not only the statistical and structural features of words but also the semantic features of words extracted through word-embedding techniques, i.e., multifeature fusion, to obtain the importance weights of words themselves and the attraction weights between words and then iteratively calculate the final weight of each word through the graph model algorithm to determine the extracted keywords. To verify the performance of the algorithm, extensive simulation experimental studies were conducted on three different types of datasets. The experimental results show that the proposed keyword extraction algorithm can improve the performance by a maximum of 6.45% and 20.36% compared with the existing word frequency statistics and graph model methods, respectively; MF-Rank can achieve a maximum performance improvement of 1.76% compared with PW-TF.
机译:在本文中,我们对新闻报道的日语关键字提取进行了深入研究,从深入学习工具Word2VEC中使用船舶克模型从文本预处理到字向量中的文本中的外部计算机文档字集,并计算余弦距离字向量。在本文中,Textrank中的滑动窗口旨在连接内部文档信息以改善文本中的语义相干性。主要思想是不仅使用单词的统计和结构特征,而且通过单词嵌入技术,即多分代码融合来利用单词的语义特征,以获得单词本身的重要性权重以及单词之间的吸引权重通过图形模型算法迭代地计算每个单词的最终权重以确定提取的关键字。为了验证算法的性能,在三种不同类型的数据集上进行了广泛的模拟实验研究。实验结果表明,与现有的单词频率统计和图形模型方法相比,所提出的关键字提取算法最多可将性能提高为6.45%和20.36%;与PW-TF相比,MF-Rank可以实现1.76%的最高性能提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号