Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Chenguang Wang; Yangqiu Song; Haoran Li; Ming Zhang; Jiawei Han

首页> 外文期刊>Data mining and knowledge discovery >Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

【24h】

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

机译：基于异构信息网络的文本相似度量的无监督元路径选择

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Heterogeneous information network (HIN) is a general representation of many different applications, such as social networks, scholar networks, and knowledge networks. A key development of HIN is called PathSim based on meta-path, which measures the pairwise similarity of two entities in the HIN of the same type. When using PathSim in practice, we usually need to handcraft some meta-paths which are paths over entity types instead of entities themselves. However, finding useful meta-paths is not trivial to human. In this paper, we present an unsupervised meta-path selection approach to automatically find useful meta-paths over HIN, and then develop a new similarity measure called KnowSim which is an ensemble of selected meta-paths. To solve the high computational cost of enumerating all possible meta-paths, we propose to use an approximate personalized PageRank algorithm to find useful subgraphs to allocate the meta-paths. We apply KnowSim to text clustering and classification problems to demonstrate that unsupervised meta-path selection can help improve the clustering and classification results. We use Freebase, a well-known world knowledge base, to conduct semantic parsing and construct HIN for documents. Our experiments on 20Newsgroups and RCV1 datasets show that KnowSim results in impressive high-quality document clustering and classification performance. We also demonstrate the approximate personalized PageRank algorithm can efficiently and effectively compute the meta-path based similarity.

机译：异构信息网络（HIN）是许多不同应用的一般代表，例如社交网络，学者网络和知识网络。基于元路径的Hin的关键开发称为Pathsim，其测量相同类型的HIN中的两个实体的成对相似性。在实践中使用Pathsim时，我们通常需要手动一些元路径，这些路径是实体类型而不是实体本身的路径。然而，寻找有用的元路径并不琐碎。在本文中，我们介绍了一个无监督的元路径选择方法，以自动查找在HIN上的有用的元路径，然后开发一种名为Knowsim的新的相似性度量，这是所选元路径的集合。为了解决枚举所有可能的元路径的高计算成本，我们建议使用近似个性化PageRank算法来查找有用的子图来分配元路径。我们将知识应用于文本聚类和分类问题，以证明无监督的元路径选择可以帮助改善聚类和分类结果。我们使用FreeBase是一个着名的世界知识库，为文档进行语义解析和构建HIN。我们在20新新手组和RCV1数据集上的实验表明，知识关注导致令人印象深刻的高质量文档聚类和分类性能。我们还演示了近似个性化PageRank算法可以有效地和有效地计算基于元路径的相似性。

著录项

来源
《Data mining and knowledge discovery》 |2018年第6期|共33页
作者
Chenguang Wang; Yangqiu Song; Haoran Li; Ming Zhang; Jiawei Han;
展开▼
作者单位

Amazon AI;

Department of CSE HKUST;

School of EECS Peking University;

School of EECS Peking University;

Department of CS UIUC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Heterogeneous information network; Similarity; Text categorization;

机译：异构信息网络;相似性;文本分类;

相似文献

外文文献
中文文献
专利

1. Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks [J] . Chenguang Wang, Yangqiu Song, Haoran Li, Data mining and knowledge discovery . 2018,第6期

机译：基于异构信息网络的文本相似度量的无监督元路径选择
2. DPRel: A Meta-Path Based Relevance Measure for Mining Heterogeneous Networks [J] . Gupta Mukul, Kumar Pradeep, Bhasker Bharat Information systems frontiers . 2019,第5期

机译：DPRel：用于挖掘异构网络的基于元路径的相关性度量
3. PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks [J] . YIZHOU SUN, BRANDON NORICK, JIAWEI HAN, ACM transactions on knowledge discovery from data . 2013,第3期

机译：PathSelClus：在异构信息网络中将元路径选择与用户指导的对象聚类集成
4. User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks [C] . Xiao Yu, Yizhou Sun, Brandon Norick, ACM international conference on information and knowledge management . 2012

机译：异构信息网络中使用元路径选择的用户指导实体相似性搜索
5. Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features. [D] . Muthukrishnan, Pradeep. 2011

机译：使用异构特征的无监督基于图的相似性学习。
6. Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network [O] . Mengting Wan, Yunbo Ouyang, Lance Kaplan, -1

机译：异构信息网络中基于图正则化元路径的传递回归
7. User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks [O] . Xiao Yu, Yizhou Sun, On Norick, 2013

机译：异构信息网络中使用元路径选择的用户指导实体相似性搜索

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

摘要

著录项

相似文献

相关主题

期刊订阅