首页> 外文会议>International Congress on Digital Heritage >Knowledge Management and Cultural Heritage Repositories: Cross-Lingual Information Retrieval Strategies
【24h】

Knowledge Management and Cultural Heritage Repositories: Cross-Lingual Information Retrieval Strategies

机译:知识管理和文化遗产存储库:交叉语言信息检索策略

获取原文

摘要

In the last years important initiatives, like the development of the European Library and Europeana, aim to increase the availability of cultural content from various types of providers and institutions. The accessibility to these resources requires the development of environments which allow both to manage multilingual complexity and to preserve the semantic interoperability. The creation of Natural Language Processing (NLP) applications is finalized to the achievement of Cross-Lingual Information Retrieval (CLIR). This paper presents an ongoing research on language processing based on the Lexicon-Grammar (LG) approach with the goal of improving knowledge management in the Cultural Heritage repositories. The proposed framework aims to guarantee interoperability between multi-lingual systems in order to overcome crucial issues like cross-language and cross-collection retrieval. Indeed, the LG methodology tries to overcome the shortcomings of statistical approaches as in Google Translate or Bing by Microsoft concerning Multi-Word Unit (MWU) processing in queries, where the lack of linguistic context represents a serious obstacle to disambiguation. In particular, translations concerning specific domains, as it is has been widely recognized, is unambiguous since the meanings of terms are mono-referential and the type of relation that links a given term to its equivalent in a foreign language is biunivocal, i.e. a one-to-one coupling which causes this relation to be exclusive and reversible. Ontologies are used in CLIR and are considered by several scholars a promising research area to improve the effectiveness of Information Extraction (IE) techniques particularly for technical-domain queries. Therefore, we present a methodological framework which allows to map both the data and the metadata among the language-specific ontologies. This experiment has been set up for the English/Italian language pair and it can be easily extended to other language pairs. The feasibility of cross-language information extraction and semantic search will be tested by implementing an early prototype system.
机译:在过去几年的重要举措,如欧洲图书馆和欧洲的发展,旨在增加各种类型的提供商和机构的文化内容的可用性。这些资源的可访问性需要开发允许管理多语言复杂性并保留语义互操作性的环境。自然语言处理(NLP)应用程序的创建最终确定了实现交叉信息检索(CLIR)。本文介绍了基于词汇语法(LG)方法的语言处理持续研究,其目的是改善文化遗产储存库中的知识管理。拟议的框架旨在保证多语言系统之间的互操作性,以克服跨语言和交叉收集检索等至关重要的问题。实际上,LG方法试图克服统计方法的缺点,如谷歌翻译或通过微软关于查询中的多字单元(MWU)处理,其中缺乏语言上下文代表歧义的严重障碍。特别地,关于特定域的翻译,因为它已被广泛认识到,因为术语的含义是单引用的含义,并且将给定期链接到外语中的相同类型的关系是双因素的,即一个 - 一个耦合,导致这种关系是独家和可逆的。本体在CLIR中使用,由几位学者考虑了一个有前途的研究领域,以提高信息提取(IE)技术的有效性,特别是对于技术领域查询。因此,我们提出了一种方法论框架,允许在特定于语言的本体中映射数据和元数据。此实验已为英文/意大利语对设置,并且可以轻松扩展到其他语言对。通过实现早期原型系统,将通过实现跨语言信息提取和语义搜索的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号