A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

Nasution Arbi Haza; Murakami Yohei; Ishida Toru

首页> 外文期刊>ACM transactions on Asian language information processing >A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

【24h】

A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

机译：低资源语言家庭双语词典归纳的广义约束方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such languages. We propose constraint-based bilingual lexicon induction for closely related languages by extending constraints from the recent pivot-based induction technique and further enabling multiple symmetry assumption cycle to reach many more cognates in the transgraph. We further identify cognate synonyms to obtain many-to-many translation pairs. This article utilizes four datasets: one Austronesian low-resource language and three Indo-European high-resource languages. We use three constraint-based methods from our previous work, the Inverse Consultation method and translation pairs generated from Cartesian product of input dictionaries as baselines. We evaluate our result using the metrics of precision, recall, and F-score. Our customizable approach allows the user to conduct cross validation to predict the optimal hyperparameters (cognate threshold and cognate synonym threshold) with various combination of heuristics and number of symmetry assumption cycles to gain the highest F-score. Our proposed methods have statistically significant improvement of precision and F-score compared to our previous constraint-based methods. The results show that our method demonstrates the potential to complement other bilingual dictionary creation methods like word alignment models using parallel corpora for high-resource languages while well handling low-resource languages.

机译：对于缺乏资源的语言，缺少或缺乏平行和可比的语料库使得双语词典提取变得困难。事实证明，枢轴语言和同源识别方法可用于为此类语言引入双语词典。我们通过扩展最近基于枢轴的归纳技术的约束条件，并进一步使多重对称性假设循环能够在跨谱图中获得更多认知，为紧密相关的语言提出基于约束的双语词典归纳。我们进一步确定同源同义词以获得多对多翻译对。本文利用了四个数据集：一种南极低资源语言和三种印度-欧洲高资源语言。我们使用以前工作中的三种基于约束的方法，即反向咨询方法和从输入字典的笛卡尔积生成的翻译对作为基线。我们使用精度，召回率和F分数来评估我们的结果。我们的可定制方法允许用户使用启发式方法和对称假设周期数的各种组合进行交叉验证，以预测最佳超参数（同源阈值和同源同义词阈值），以获得最高的F分数。与我们以前的基于约束的方法相比，我们提出的方法在统计和精度上都有明显的提高。结果表明，我们的方法展示了对其他双语词典创建方法（例如针对高资源语言使用并行语料库的单词对齐模型）进行补充的潜力，同时可以很好地处理低资源语言。

著录项

来源
《ACM transactions on Asian language information processing》 |2018年第2期|9.1-9.29|共29页
作者
Nasution Arbi Haza; Murakami Yohei; Ishida Toru;
展开▼
作者单位

Kyoto Univ, Dept Social Informat, Sakyo Ku, Kyoto 6068501, Japan;

Kyoto Univ, Unit Design, Shimogyo Ku, 506,KRP Bldg 9,91 Chudoji Awata Cho, Kyoto 6008815, Japan;

Kyoto Univ, Dept Social Informat, Sakyo Ku, Kyoto 6068501, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Constraint satisfaction problem; low-resource languages; closely-related languages; pivot-based bilingual lexicon induction; cognate recognition;

机译：约束满足问题;资源匮乏的语言;紧密相关的语言;基于枢轴的双语词典归纳;同源识别;

相似文献

外文文献
中文文献
专利

1. Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families [J] . Nasution Arbi Haza, Murakami Yohei, Ishida Toru ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：计划优化低资源语言系列的双语词典归纳
2. A Constraint Approach to Pivot-Based Bilingual Dictionary Induction [J] . MAIRIDAN WUSHOUER, DONGHUI LIN, TORU ISHIDA, ACM transactions on Asian language information processing . 2016,第1期

机译：基于透视的双语词典归纳的一种约束方法
3. Examining the Effectiveness of ‘Bilingual Dictionary Plus’ – A Dictionary for Production in a Foreign Language [J] . Batia Laufer and Tamar Levitzky-Aviad International Journal of Lexicography . 2006,第2期

机译：检查“双语词典增强版”的有效性–外语制作词典
4. Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary [C] . Meng Fang, Trevor Cohn Annual meeting of the Association for Computational Linguistics . 2017

机译：使用双语词典标记低资源语言的模型传递
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Phonotactic Constraints Are Activated across Languages in Bilinguals [O] . Max R. Freeman, Henrike K. Blumenfeld, Viorica Marian -1

机译：语音约束在双语者中跨语言激活
7. A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families [O] . Arbi Haza Nasution, Yohei Murakami, Toru Ishida 2018

机译：低资源语言系列双语词典归纳的广义约束方法
8. Linguistic-Core Approach to Structured Translation and Analysis of Low-Resource Languages. [R] . Carbonell, J., Levin, L., Smith, N., 2017

机译：结构化翻译的语言核心方法与低资源语言分析。

A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅