首页> 外文会议>IEEE International Congress on Big Data >Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history
【24h】

Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history

机译:CRF和基于模式的中国历史传记信息挖掘当地文学中国地名词典

获取原文

摘要

Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese. We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents. Practical evaluations were conducted with texts that were extracted from more than 220 volumes of local gazetteers (Difangzhi, $$$). Difangzhi is a huge and the single most important collection that contains information about officers who served in local government in Chinese history. Our methods performed very well on these realistic tests. Thousands of names and addresses were identified from the texts. A good portion of the extracted names match the biographical information currently recorded in the China Biographical Database (CBDB) of Harvard University, and many others can be verified by historians and will become as new additions to CBDB.1
机译:人名和位置名是识别以中文书写的历史文献中的事件和社交网络的基本组成部分。我们率先探索基于语言模型和条件随机场的方法,以算法识别历史文学中的命名实体的研究,并将我们的工作扩展到挖掘历史文献中的文献结构。使用从220多个地方地名词典中提取的文本进行了实际评估(Difangzhi,$$$)。 Difangzhi是一个庞大且单一的最重要的藏书,其中包含有关中国历史上在地方政府中任职的军官的信息。我们的方法在这些实际测试中表现非常出色。从文本中识别出数千个名称和地址。提取的姓名中有很大一部分与当前在哈佛大学中国传记数据库(CBDB)中记录的传记信息相匹配,其他许多姓名可以由历史学家进行验证,并将成为CBDB的新增名称。1

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号