首页> 外文会议>Joint Spoken Language Technolologies for Under-resourcd Languages and Collaboration and Computing for Under-Resourced Languages Workshop >Adapting Language Specific Components of Cross-Media Analysis Frameworks to Less-Resourced Languages: the Case of Amharic
【24h】

Adapting Language Specific Components of Cross-Media Analysis Frameworks to Less-Resourced Languages: the Case of Amharic

机译:使跨媒体分析框架的语言特定组成部分适应资源较少的语言:以阿姆哈拉语为例

获取原文

摘要

We present an ASR based pipeline for Amharic that orchestrates NLP components within a cross media analysis framework (CMAF). One of the major challenges that are inherently associated with CMAFs is effectively addressing multi-lingual issues. As a result, many languages remain under-resourced and fail to leverage out of available media analysis solutions. Although spoken natively by over 22 million people and there is an ever-increasing amount of Amharic multimedia content on the Web, querying them with simple text search is difficult. Searching for, especially audio/video content with simple key words, is even hard as they exist in their raw form. In this study, we introduce a spoken and textual content processing workflow into a CMAF for Amharic. We design an ASR-named entity recognition (NER) pipeline that includes three main components: ASR, a transliterator and NER. We explore various acoustic modeling techniques and develop an OpenNLP-based NER extractor along with a transliterator that interfaces between ASR and NER. The designed ASR-NER pipeline for Amharic promotes ihe multi-lingual support of CMAFs. Also, the state-of-the art design principles and techniques employed in this study shed light for other less-resourced languages, particularly the Semitic ones.
机译:我们为Amharic提出了一个基于ASR的管道,该管道在跨媒体分析框架(CMAF)中协调NLP组件。与CMAF固有相关的主要挑战之一是有效解决多语言问题。结果,许多语言的资源仍然不足,无法利用可用的媒体分析解决方案。尽管有超过2200万人以母语进行交流,并且网络上的Amharic多媒体内容越来越多,但是使用简单的文本搜索来查询它们却很困难。搜索,尤其是带有简单关键字的音频/视频内容,因为它们以原始形式存在,因此甚至很难。在这项研究中,我们将语音和文本内容处理工作流程引入了Amharic的CMAF。我们设计了一个名为ASR的实体识别(NER)管道,该管道包括三个主要组件:ASR,音译器和NER。我们探索了各种声学建模技术,并开发了基于OpenNLP的NER提取器以及在ASR和NER之间进行交互的音译器。为Amharic设计的ASR-NER管道可促进CMAF的多语言支持。同样,本研究中采用的最新设计原理和技术为其他资源较少的语言(尤其是闪族语言)提供了启示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号