首页> 外国专利> Methods and systems for the analysis of large text corpora

Methods and systems for the analysis of large text corpora

机译:大型文本语料库的分析方法和系统

摘要

Computerized methods and systems for the analysis of textual data, including: receiving, from one or more memories at one or more processors, textual data; using the processors, formatting the textual data for analysis and applying a probabilistic topic model to the textual data to extract semantically meaningful topics that collectively describe it; using a keyword weighting module, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords; using a topic ordering module, generating a document distribution view representing a distribution of the textual data across multiple topics; using a document entropy calculation module, generating a document scatterplot view representing how many topics are attributable to the textual data; using a temporal topic trend calculation module, generating a temporal view representing changes in the occurrence of topics over time; and displaying one or more of the views to a user.
机译:用于分析文本数据的计算机化方法和系统,包括:从一个或多个处理器处的一个或多个存储器接收文本数据;使用处理器,格式化文本数据以进行分析,并将概率主题模型应用于文本数据,以提取语义上有意义的主题,以共同描述该主题;使用关键词加权模块,生成将话题表示为标签云的话题云视图,每个话题云与多个关键词相关联;使用主题排序模块,生成文档分布视图,该文档分布视图表示文本数据在多个主题之间的分布;使用文档熵计算模块,生成文档散点图视图,该文档散点图视图表示可归因于文本数据的主题数;使用临时主题趋势计算模块,生成表示临时主题随时间变化的临时视图;并向用户显示一个或多个视图。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号