【24h】

Introduction

机译:介绍

获取原文
获取原文并翻译 | 示例

摘要

During the last decades Digital Humanities evolved dramatically, from simple database applications to complex systems involving most recent state of the art in Computer Science. Especially Language Technology plays a major role either for processing the metadata of recorded objects or for analyzing and interpreting content Applying language technology methods to objects from humanities is a challenge for NLP-research: data is heterogenous (image /text), often incomplete (e.g. OCR errors), multilingual within one document (historic documents with Latin or /and classical Greek paragraphs) and difficult to structure (paragraphs, titles, pages are somewhat different in historical texts).Corpus-based methods, nowadays standard in NLP research cannot be often applied as the necessary large training data is missing. Moreover requirements of tools for digital humanities, especially such tools dedicated to cultural heritage objects are different from those for tools applied to modern texts. Thus performing research in Digital Humanities involves also adapting existent NLP Tools for historical variants of languages, developing tools for new languages, making tools robust for syntactic deviation and adapting semantic resources.
机译:在过去的几十年中,数字人文科学领域发生了巨大的变化,从简单的数据库应用到复杂的系统,涉及计算机科学领域的最新技术。尤其是语言技术在处理记录对象的元数据或分析和解释内容方面起着主要作用。将语言技术方法应用于人文学科的对象对NLP研究是一个挑战:数据是异构的(图像/文本),通常是不完整的(例如, OCR错误),在一个文档中使用多种语言(带有拉丁文或/和古典希腊段落的历史文档)以及难以结构化(历史文本中的段落,标题,页面略有不同)。基于Corpus的方法,如今在NLP研究中已经不能成为标准通常因为缺少必要的大型训练数据而被应用。此外,对数字人文科学工具的要求,尤其是专门用于文化遗产对象的工具,与用于现代文本的工具的要求不同。因此,在数字人文科学领域进行研究还涉及对语言的历史变体改编现有的NLP工具,为新的语言开发工具,使工具对于语法偏离具有鲁棒性并适应语义资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号