首页> 外文会议>International conference on artificial neural networks >Self-Organization of Very Large Document Collections: State of the Art
【24h】

Self-Organization of Very Large Document Collections: State of the Art

机译:非常大的文件集合的自我组织:最先进

获取原文

摘要

The Self-Organizing Map (SOM) forms a nonlinear projection from a high-dimensional data manifold onto a low-dimensional grid. A representative model of some subset of data is associated with each grid point. The SOM algorithm computes an optimal collection of models that approximates the data in the sense of some error criterion and also takes into account the similarity relations of the models. The models then become ordered on the grid according to their similarity. When the SOM is used for the exploration of statistical data, the data vectors can be approximated by models of the same dimensionality. When mapping documents, one can represent them statistically by their word frequency histograms or some reduced representations of the histograms that can be regarded as data vectors. We have made SOMs of collections of over one million documents. Each document is mapped onto some grid point, with a link from this point to the document database. The documents are ordered on the grid according to their contents and neighboring documents can be browsed readily. Keywords or key texts can be used to search for the most relevant documents first. New effective coding and computing schemes of the mapping are described.
机译:自组织地图(SOM)形成从高维数据歧管的非线性投影到低维网格。一些数据子集的代表性模型与每个网格点相关联。 SOM算法计算最佳的模型集合,其近似于某些错误标准的感觉中的数据,并且还考虑了模型的相似关系。然后,模型根据它们的相似性排列在网格上。当SOM用于探索统计数据时,数据向量可以通过相同维度的模型来近似。当映射文档时,可以通过它们的单词频率直方图或可以被视为数据向量的直方图的一些减少的表示来统计上统计。我们制作了超过一百万个文件的集合。每个文档都映射到某个网格点,从此指向到文档数据库的链接。根据其内容,这些文档在网格上订购,并且可以容易地浏览邻近的文件。关键字或关键文本可用于首先搜索最相关的文档。描述了映射的新有效编码和计算方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号