Focusing on multi-document personal name disambiguation, this paper develops an agglo-merative clustering approach to resolving this problem. We start from an analysis of point-wise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that, we apply a labeling method to find representative feature for each cluster. Finally, experiments are conducted on word-based clustering in Chinese dataset and the result shows a good effect.
展开▼