首页> 外文会议>International Conference on Big Data, Small Data, Linked Data and Open Data >Research of Topics Discovery and Tech Evolution Based on Text Preprocessed Latent Dirichlet Allocation Model: Research Topic Analysis in GaN Tech Field
【24h】

Research of Topics Discovery and Tech Evolution Based on Text Preprocessed Latent Dirichlet Allocation Model: Research Topic Analysis in GaN Tech Field

机译:基于文本预处理潜在Dirichlet分配模型的主题发现与技术演变研究:GAN技术领域的研究主题分析

获取原文

摘要

Computational Science and Data Science are inspiring the intelligent analysis and information service today. Machine learning text analysis is changing the traditional analysis methods. This article discusses the benefits of unsupervised learning approaches in patent text mining. Patent data of GaN industry were preprocessed by filter model based on NLTK Toolkit to identify the tech terms and then clustered them based on Latent Dirichlet Allocation model to find the latent topics which were visualized. Based on group operation, new emerging terms ranked by TFIDF through every year were used to reveal the research and development focused evolution. This research offers a demonstration of the proposed method based on 26,854 GaN patents. The results show 20 Research and Development topics with tech terms in GaN industry and present a Research and Development focus evolution based on new emerging terms every year, which provides a clue for more detaied analyses later. Our results show an efficent way to find technology focused evolution from a large scale text data.
机译:计算科学和数据科学正在激励今天智能分析和信息服务。机器学习文本分析正在改变传统的分析方法。本文讨论了未经监督学习方法在专利文本挖掘中的好处。基于NLTK工具包的滤波器模型预处理GaN工业的专利数据,以识别技术术语,然后基于潜在的Dirichlet分配模型群集它们以找到可视化的潜在主题。基于团体运作,每年都通过TFIDF排名的新新兴术语用于揭示重点的研究和发展。本研究提供了基于26,854 GaN专利的提出方法的演示。结果表明,甘甘业的技术术语研究和开发主题,并以每年新的新兴术语为基础的研发重点演变,为稍后提供了更多的驱动分析。我们的结果表明,从大规模文本数据中找到技术聚焦演化的有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号