首页> 外文期刊>Library hi tech >System design for location name recognition in ancient local chronicles
【24h】

System design for location name recognition in ancient local chronicles

机译:古代地方志中位置名称识别的系统设计

获取原文
获取原文并翻译 | 示例
       

摘要

Purpose-The purpose of this paper is to present a system for recognition of location names in ancient books written in languages, such as Chinese, in which proper names are not signaled by an initial capital letter. Design/methodology/approach-Rule-based and statistical methods were combined to develop a set of rules for identification of product-related location names in the local chronicles of Guangdong. A name recognition system, with functions of document management, information extraction and storage, rule management, location name recognition, and inquiry and statistics, was developed using Microsoft's .NET framework, SQL Server 2005, ADO.NET and XML. The system was evaluated with precision ratio, recall ratio and the comprehensive index, F. Findings-The system was quite successful at recognizing product-related location names (F was 71.8 percent), demonstrating the potential for application of automatic named entity recognition techniques in digital collation of ancient books such as local chronicles. Research limitations/implications-Results suffered from limitations in initial digitization of the text. Statistical methods, such as the hidden Markov model, should be combined with an extended set of recognition rules to improve recognition scores and system efficiency. Practical implications-Electronic access to local chronicles by location name saves time for chorographers and provides researchers with new opportunities. Social implications-Named entity recognition brings previously isolated ancient documents together in a knowledge base of scholarly and cultural value. Originality/value-Automatic name recognition can be implemented in information extraction from ancient books in languages other than English. The system described here can also be adapted to modern texts and other named entities.
机译:目的-本文的目的是提供一种识别以中文(例如中文)书写的古籍中的地名的系统,在该系统中,专有名称不会以首字母大写表示。结合设计/方法论/方法规则和统计方法,制定了一套识别广东地方志中与产品相关的地名的规则。使用Microsoft的.NET框架,SQL Server 2005,ADO.NET和XML,开发了具有文件管理,信息提取和存储,规则管理,位置名称识别以及查询和统计功能的名称识别系统。该系统通过准确率,召回率和综合指数F进行评估。Findings-该系统在识别与产品相关的位置名称方面非常成功(F为71.8%),证明了在自动命名实体识别技术中应用潜力。古代书籍(例如地方志)的数字整理。研究局限性/含义-文本的初始数字化受到局限。统计方法(例如隐马尔可夫模型)应与扩展的识别规则集结合使用,以提高识别分数和系统效率。实际意义-通过位置名称电子访问地方纪事可以节省编舞人员的时间,并为研究人员提供新的机会。社会意义-命名实体识别将先前孤立的古代文献集中在具有学术和文化价值的知识库中。原创性/价值-自动名称识别可以用英语以外的其他语言从古籍中提取信息来实现。这里描述的系统也可以适用于现代文本和其他命名实体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号