首页> 外文期刊>Aslib Proceedings >Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources
【24h】

Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources

机译:挖掘灰色文献:通过自然语言处理技术和基于知识的资源丰富考古文献索引的案例研究

获取原文
获取原文并翻译 | 示例
       

摘要

Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project.rnDesign/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules.rnFindings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigations), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.
机译:目的-本文着手讨论信息提取(IE)的使用,IE是一种自然语言处理(NLP)技术,可帮助对各种考古文本资源进行“丰富的”语义索引。该研究的重点是指导具有多种语义特征的“丰富”索引,这些索引具有能够满足从与考古资源语义技术(STAR)项目相关的在线出版物和数据集检索信息的特性。rnDesign/ methodology /方法-本文建议在文化遗产中使用标准核心本体的英语遗产扩展(CRM-EH),CIDOC CRM,并利用领域叙词表资源来驱动和增强面向本体的信息提取过程。语义索引的过程基于一种基于规则的信息提取技术,该技术由文本工程通用体系结构(GATE)工具包提供便利,并由Java注释模式引擎(JAPE)规则表示。rnFindings-初步结果表明该组合利用知识资源和标准概念模型进行信息提取的过程能够支持语义感知的术语索引。需要进一步的努力来进一步开发该技术,并采用正式的评估方法以可测量的方式评估该方法的性能。原创性/价值-该论文的价值在于,从考古数据服务OASIS语料库(在线AccesS到考古调查索引)对535个未发布的在线文档(通常称为“灰色文学”)进行语义索引。 CRM本体概念E49。时间称谓和P19。物理对象。

著录项

  • 来源
    《Aslib Proceedings》 |2010年第5期|P.466-475|共10页
  • 作者单位

    Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, UK;

    Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, UK;

    Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, UK;

    English Heritage, Portsmouth, UK;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    information management; semantics; data handling;

    机译:信息管理;语义数据处理;
  • 入库时间 2022-08-17 23:15:48

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号