首页> 中文期刊>情报杂志 >数据规模对合著关系预测的影响研究

数据规模对合著关系预测的影响研究

     

摘要

[ Purpose/Significance] In order to find the optimaldatasetsize for co-authorship predictionand compareindicators of co-au-thorship prediction fairly, we need to compare and analyze the changes of overall accuracy and optimal indicators in different size datasets for co-authorship prediction. [ Method/Process] This paper selects 12 representative indicators for co-authorship prediction including com-mon indicator ( CN) and its improvements, and then useslink prediction method for calculating accuraciesof different indicators in different size co-authorship networks and finds the best appropriate indicator for co-authorship prediction. It could reveal how and why data size in-fluences co-authorship prediction. [ Result/Conclusion] In the field of Library and Information Science, the different sizedatasets of co-authorship network are formed through author occurringfrequency. The results show that the larger the size of the datasets, the higher the o-verall accuracy of the co-authorship prediction. The best appropriate dataset is the co-authorship network without any filtering because the accuracy of full dataset is the highest that achieves a huge boost compared to others. Furthermore,the indicators have biases in different datasets because optimal indicator changes along with the different size of datasets. It indicates thata fair comparison among indicators needs to be experimented amongdifferent size datasets. The reason is that the largerthe data size becomes,the closerthe co-authorship net-work is to the real situation, and thereforethe advantages of improved indicators couldbe fully activated. The method could be extended to other areas and to validate the conclusions.%[目的/意义]为了发现适合合著关系预测的最佳数据集规模,并公平比较合著关系预测的指标,需要比较和分析不同数据规模下合著关系预测的整体准确率和最优指标的变化情况。[方法/过程]选取12个共同邻居及其改进指标作为代表性的合著关系预测指标,在不同规模的合著网络数据集上运用链路预测的理论和方法计算不同指标的预测准确率,并发现不同数据规模下的最优指标,从而揭示数据规模对合著关系预测的影响以及造成这些影响的原因。[结果/结论]在图书情报领域,通过作者出现频次大小形成不同规模的合著网络数据集,实验结果表明,数据规模越大,合著关系预测的整体准确率越高,并在合著网络全数据集上实现了准确率的巨大提升,说明没有经过任何过滤的完整合著网络是合著关系预测的最佳数据集;同时,不同数据集中合著关系预测的最优指标发生了变化,验证了指标具有数据规模偏好,说明公平科学比较合著关系预测指标需要在多个不同规模的数据集下进行。造成该结果的原因在于随着数据规模变大,合著网络数据集越接近真实情况,改进指标的优势得到了充分发挥。该方法可以扩展应用到其他领域并对结论进行验证。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号