首页> 外文OA文献 >Privileged information for hierarchical document clustering: a metric learning approach

【2h】

Privileged information for hierarchical document clustering: a metric learning approach

机译：分层文档聚类的特权信息：一种度量学习方法

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Traditional hierarchical text clustering methods assume that the documents are represented only by “technical information”, i.e., keywords, phrases, expressions and named entities that can be directly extracted from the texts. However, in many scenarios there is an additional and valuable information about the documents which is usually disregarded during the clustering task, such as user-validated tags, annotations and comments from experts, dictionaries and domain ontologies. Recently, Vapnik introduced a new learning paradigm, called LUPI - Learning Using Privileged Information, which allows the incorporation of this additional (privileged) information in a supervised learning setting. We investigated the incorporation of privileged information in unsupervised setting. The key idea in our proposed approach is to extract important relationships among documents represented in the privileged information dimensional space to learn a more accurate metric for text clustering in the technical information space. A thorough experimental evaluation indicates that the incorporation of privileged information through metric learning significantly improves the hierarchical clustering accuracy.

机译：传统的分层文本聚类方法假定文档仅由“技术信息”表示，即可以直接从文本中提取的关键字，短语，表达式和命名实体。但是，在许多情况下，存在有关文档的其他有价值的信息，而这些信息通常在聚类任务期间会被忽略，例如用户验证的标签，专家，词典和领域本体的注释和注释。最近，Vapnik引入了一种新的学习范式，称为LUPI-使用特权信息进行学习，它允许在监督学习环境中合并这些附加（特权）信息。我们调查了在无人监督的情况下特权信息的合并。我们提出的方法的关键思想是提取特权信息维度空间中表示的文档之间的重要关系，以了解用于技术信息空间中文本聚类的更准确度量。全面的实验评估表明，通过度量学习并入特权信息可以显着提高分层聚类的准确性。

著录项

作者
Marcacini Ricardo M.; Domingues Marcos Aurelio; Hruschka Eduardo Raul; Rezende Solange Oliveira;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies [J] . David M. Johnson, Caiming Xiong, Jason J. Corso IEEE Transactions on Knowledge and Data Engineering . 2016,第4期

机译：最大边距聚类层次结构森林的半监督非线性距离度量学习
2. A Novel Parallel Algorithm for Clustering Documents Based on the Hierarchical Agglomerative Approach [J] . Amal Elsayed Aboutabl, Mohamed Nour Elsayed International Journal of Computer Science & Information Technology (IJCSIT) . 2011,第2期

机译：基于层次聚类的并行文档聚类新算法
3. A Clustering-Based Approach for Integrating Document-Category Hierarchies [J] . Tsang-Hsiang Cheng, Chih-Ping Wei IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans . 2008,第2期

机译：基于聚类的文档类别层次结构集成方法
4. Privileged Information for Hierarchical Document Clustering: A Metric Learning Approach [C] . Marcacini Ricardo Marcondes, Domingues Marcos Aurelio, Hruschka Eduardo R., International Conference on Pattern Recognition . 2014

机译：分层文档聚类的特权信息：一种度量学习方法
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. A deep learning and similarity-based hierarchical clustering approach for pathological stage prediction of papillary renal cell carcinoma [O] . Sugi Lee, Jaeeun Jung, Ilkyu Park, 2020

机译：基于深度学习和相似性的乳头肾细胞癌病理阶段预测的分层聚类方法
7. Privileged information for hierarchical document clustering: a metric learning approach [O] . Marcacini Ricardo M., Domingues Marcos Aurelio, Hruschka Eduardo Raul, 2014

机译：分层文档聚类的特权信息：度量学习方法

Privileged information for hierarchical document clustering: a metric learning approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅