Bibliography information network is a typical heterogeneous information network and the similarity search based on it is a hot topic of graph mining.However,current methods mainly adopt meta path or meta structure to search similar objects,do not consider semantic features of node itself which leads to a deviation in the search results.To fill this gap,a vector-based semantic feature extraction method was proposed,and a vector-based node similarity calculation method called VSim was designed and implemented.In addition,a similarity search algorithm VPSim (Similarity computation Based on Vector and meta Path) based on semantic features was designed by combining the meta-paths.In order to improve the execution efficiency of the algorithm,a pruning strategy based on the characteristics of bibliographic network data was designed.Experiments on real-world data sets demonstrate that VSim is applicative for searching entities with similar semantic features and VPSim is effective,efficient and extensible.%文献信息网络是典型的异构信息网络,基于其进行相似性搜索是图挖掘领域的一个研究热点.然而,现有的方法主要采用元路径或元结构的方式,并未考虑节点自身的语义特征,从而导致搜索结果出现偏差.对此,基于文献信息网络提出了一种基于向量的语义特征提取方法,并设计实现了基于向量的节点相似性计算方法VSim;此外,结合元路径设计了基于语义特征的相似性搜索算法VPSim;为提高算法的执行效率,针对文献网络数据的特点,设计了剪枝策略.通过在真实数据上的实验,验证了VSim对搜索语义特征相似实体的适用性,以及VPSim算法的有效性、高执行效率和高可扩展性.
展开▼