首页> 外文会议>Science and Information Conference >Correlated community estimation models over a set of names
【24h】

Correlated community estimation models over a set of names

机译:一组名称的相关社区估计模型

获取原文

摘要

Generally surnames (family name) or forenames are evolved over generations which can be used to understand population origins, migration, identity, social norms and cultural customs. These forenames or surnames may have hidden structure associated with them called communities. Each community might have strong correlation among several forenames and surnames. In addition, the correlation might be across communities of forenames or surnames. Popular statistical generative model such as Latent Dirichlet Allocation (LDA) has been developed to find topics in a corpus of documents. However, the LDA model can be proposed to identify hidden communities in names data set. This paper proposes several variants of latent Dirichlet allocation models to capture correlation between surnames and forenames within the communities and across the communities over a set of names collected at different locations. Initially, we propose surname correlated LDA model and forename correlated LDA model. These models identify communities in surnames or forenames and extract corresponding correlated forenames or surnames in each community respectively. Later, we propose surname community correlated LDA model and forename community correlated LDA model. These models estimate correlation among each surname community to the communities of forenames and vice versa respectively. We experiment for India and United Kingdom names data sets and conclusions are drawn.
机译:通常,姓氏(姓)或姓氏是经过几代人演变而来的,可用于了解人口的起源,迁徙,身份,社会规范和文化习俗。这些姓氏或姓氏可能具有与其相关联的隐藏结构,称为社区。每个社区在几个姓氏和姓氏之间可能有很强的相关性。此外,相关性可能跨越姓氏或姓氏社区。已经开发了流行的统计生成模型,例如潜在狄利克雷分配(LDA),以在文档集中查找主题。但是,可以建议使用LDA模型来识别名称数据集中的隐藏社区。本文提出了潜在的狄利克雷分配模型的几种变体,以捕获在社区内以及跨社区的姓氏和姓氏之间的相关性,这些姓氏和姓氏之间的联系是在不同位置收集的一组名称上的。最初,我们提出姓氏相关的LDA模型和姓氏相关的LDA模型。这些模型以姓氏或姓氏来识别社区,并分别提取每个社区中相应的相关姓氏或姓氏。后来,我们提出了姓氏族相关的LDA模型和姓氏族相关的LDA模型。这些模型分别估计了每个姓氏族与前额族之间的相关性,反之亦然。我们尝试印度和英国的名称数据集并得出结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号