首页> 外文会议>ACM conference on information and knowledge management >Exploiting Sequential Relationships for Familial Classification
【24h】

Exploiting Sequential Relationships for Familial Classification

机译:利用家族分类的顺序关系

获取原文

摘要

The pervasive nature of the internet has caused a significant transformation in the field of genealogical research. This has impacted not only how research is conducted, but has also dramatically increased the number of people discovering their family history. Recent market research (Maritz Marketing 2000, Harris Interactive 2009) indicates that general interest in the United States has increased from 45% in 1996, to 60% in 2000, and 87% in 2009. Increased popularity has caused a dramatic need for improvements in algorithms related to extracting, accessing, and processing genealogical data for use in building family trees. This paper presents one approach to algorithmic improvement in the family history domain, where we infer the familial relationships of households found in human transcribed United States census data. By applying advances made in natural language processing, exploiting the sequential nature of the census, and using state of the art machine learning algorithms, we were able to decrease the error by 35% over a hand coded baseline system. The resulting system is immediately applicable to hundreds of millions of other genealogical records where families are represented, but the familial relationships are missing.
机译:互联网的普遍性质导致族族研究领域的重大转变。这不仅影响了如何进行研究,而且也大大增加了发现他们家庭历史的人数。最近的市场研究(Maritz Marketing 2000,Harris Interactive 2009)表明,美国的一般兴趣从1996年的45%增加到2000年的60%,2009年87%。增加的受欢迎程度导致了戏剧性的改进与提取,访问和处理基因型数据相关的算法,以用于建立家庭树。本文介绍了家庭历史域的算法改进的一种方法,在那里我们推断人类转录美国人口普查数据中的家庭的家庭关系。通过应用自然语言处理中的进步,利用人口普查的顺序性,以及使用最先进的机器学习算法,我们能够通过手工编码基线系统将误差减少35%。由此产生的系统立即适用于数亿种其他族记记录,其中家庭被代表,但家庭关系丢失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号