Entities similarity is useful in many areas,such as recommendation system in E-commerce platforms,and patients grouping in healthcare,etc.In our task of calculating the entity similarity in a given knowledge graph,the attributes of every entity is provided,and a sample of entity pairs are provided with their similarity score.Therefore,we treat this task as a supervised learning problem,testing SVM,Logistic Regression,Random Forest,and Learning to rank models.%实体相似度的计算有诸多应用,例如,电商平台的相似商品推荐,医疗疗效分析中的相似病人组等.在知识图谱的实体相似度计算中,给出了每个实体的属性值,并对部分实体进行相似度的标注,要求能得到其他实体之间的相似度.该文把该问题归结为监督学习问题,提出一种通用的实体相似度计算方法,通过清洗噪声数据,对数值、列表以及文本等不同数据类型进行预处理,使用SVM,Logistic回归等分类模型、Random Forest等集成学习模型以及排序学习模型进行建模,得到了较好的结果.
展开▼