首页> 外文期刊>Wiley interdisciplinary reviews. Data mining and knowledge discovery >Self-organizing maps for latent semantic analysis of free-form text in support of public policy analysis
【24h】

Self-organizing maps for latent semantic analysis of free-form text in support of public policy analysis

机译:自组织映射,用于对自由格式文本进行潜在语义分析,以支持公共政策分析

获取原文
获取原文并翻译 | 示例
           

摘要

The huge amount of free-form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self-organizing maps, as considered in this context, and show how to apply the self-organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free-form commentary with a well-structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together. (C) 2013 John Wiley & Sons, Ltd.
机译:博客圈中大量的自由形式的非结构化文本,生产率的提高以及相关窗口的缩小,对寻求考虑公众意见的公共政策分析人员提出了严峻的挑战。解决此问题的大多数工具都使用XML标记和其他Web 3.0方法,而这些方法不能解决博客文章和相关评论的实际内容。在此情况下,我们对潜在语义分析和自组织映射进行了教程回顾,并展示了如何在概率潜在语义空间上将自组织映射应用于非结构化文本的完全无监督聚类的问题。一种完全独立于拼写,语法甚至源语言的方式。这提供了一种适合将自由形式的评论与结构良好的测试环境聚类的算法。该算法改为应用于学术论文摘要,被视为非结构化文本,就好像它们是博客文章一样,因为这组文档具有已知的基本事实。该算法构造单词类别图和文档图,其中具有相似含义的单词和具有相似内容的文档被聚类在一起。 (C)2013 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号