Automatic Romanization of Arabic Bibliographic Records

机译：阿拉伯语书目记录自动罗马化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

International library standards require cataloguers to tediously input Romanization of their catalogue records for the benefit of library users without specific language expertise. In this paper, we present the first reported results on the task of automatic Romanization of undiacritized Arabic bibliographic entries. This complex task requires the modeling of Arabic phonology, morphology, and even semantics. We collected a 2.5M word corpus of parallel Arabic and Romanized bibliographic entries, and benchmarked a number of models that vary in terms of complexity and resource dependence. Our best system reaches 89.3% exact word Romanization on a blind test set. We make our data and code publicly available.

机译：国际图书馆标准要求同录人员迅速地输入他们目录记录的罗马化，以便在没有特定语言专业知识的情况下的图书馆用户的利益。在本文中，我们展示了第一个报告的结果对自动romation的无知阿拉伯语书目参赛作品的任务。这项复杂的任务需要阿拉伯语音学，形态甚至语义的建模。我们收集了2.5亿字的并行阿拉伯语和罗马化的书目条目，并基准了许多在复杂性和资源依赖方面各种不同的模型。我们最好的系统在盲试验集上达到89.3％的精确词。我们使我们的数据和代码公开提供。

著录项

来源
《Workshop on Arabic Natural Language Processing》|2021年|213-218|共6页
会议地点
作者
Fadhl Eryani; Nizar Habash;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:58:14

相似文献

外文文献
中文文献
专利

1. eCataloguer: An Automatic Tool for Developing Bibliographic Database using MARCXML Records [J] . K. Nageswara Rao, A.L. Moorthy DESIDOC Journal of Library Information Technology . 2012,第2期

机译：eCataloguer：使用MARCXML记录开发书目数据库的自动工具
2. eCataloguer: An Automatic Tool for Developing Bibliographic Database using MARCXML Records [J] . K. Nageswara Rao, A.L. Moorthy DESIDOC Journal of Library Information Technology . 2012,第2期

机译：eCataloguer：使用MARCXML记录开发书目数据库的自动工具
3. eCataloguer: An Automatic Tool for Developing Bibliographic Database using MARCXML Records [J] . K. Nageswara Rao, A.L. Moorthy DESIDOC Journal of Library & Information Technology . 2012,第2期

机译：eCataloguer：使用MARCXML记录开发书目数据库的自动工具
4. Romanized Berber and Romanized Arabic Automatic Language Identification Using Machine Learning [C] . Wafia Adouane, Nasredine Semmar, Richard Johansson Workshop on NLP for similar languages, varieties and dialects . 2016

机译：机器学习的罗马化柏柏尔文和罗马化阿拉伯文自动语言识别
5. A microcomputer-based Arabic bibliographic information retrieval system with relational thesauri (Arabic-IRS). [D] . Abu Salem, Hani Oqlah. 1992

机译：基于微计算机的带有相关叙词表的阿拉伯书目信息检索系统（Arabic-IRS）。
6. Neural Correlates of Visual versus Abstract Letter Processing in Roman and Arabic Scripts [O] . Manuel Carreiras, Manuel Perea, Cristina Gil-López, -1

机译：罗马和阿拉伯文字中视觉字母与抽象字母处理的神经相关性
7. Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script [O] . Ramy Esk, Mohamed Al-badrashiny, Nizar Habash, 2015

机译：外来词与罗马书写阿拉伯语社交媒体文本的自动处理

Automatic Romanization of Arabic Bibliographic Records

摘要

著录项

相似文献

相关主题

期刊订阅