首页> 外文期刊>Decision support systems >Unsupervised clustering for nontextual web document classification
【24h】

Unsupervised clustering for nontextual web document classification

机译:用于非文本Web文档分类的无监督聚类

获取原文
获取原文并翻译 | 示例

摘要

While the breath of vocabulary used in long documents may mislead the traditional keyword-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertext on the basis of nontextual elements in order to locate unfamiliar documents. As a result, a large portion of Web documents having pictorial information in nature is far beyond the reach of most current search engines. -In this research, we devise a novel quantitative model of nontextual World Wide Web (WWW) classification based on image information. An intelligent content-sensitive, attribute-rich image classifier is presented. An image similarity measure is used to deduce the likelihood among images. Different image feature vectors have been constructed and evaluated. Evaluation shows images judged to be similar by human form interesting clusters in our unsupervised learning. Comparison with other clustering technique, such as Hierarchical Agglomerative Clustering (HAC), demonstrates that our approach is found useful in content-based image information retrieval.
机译:虽然长文档中使用的词汇可能会误导传统的基于关键字的检索系统,但对非文本Web分类和从大型文档集中检索的技术的需求正在增加。为了定位不熟悉的文档,只有少数原型系统尝试根据非文本元素对超文本进行分类。结果,本质上具有图片信息的大部分Web文档远远超出了大多数当前搜索引擎的能力。 -在这项研究中,我们设计了一种基于图像信息的新型非文本万维网(WWW)分类定量模型。提出了一种智能的内容敏感,属性丰富的图像分类器。图像相似性度量用于推断图像之间的似然性。已经构建和评估了不同的图像特征向量。评估显示,在我们的无监督学习中,被人类形式的有趣聚类判断为相似的图像。与其他聚类技术(例如层次聚类聚类(HAC))的比较表明,我们的方法在基于内容的图像信息检索中很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号