An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus

Long H. B. Nguyen; Dien Dinh; Phuoc Tran

首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus

【24h】

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus

机译：一个方法来构造一个命名实体注释English-Vietnamese双语语料库

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Manually constructing an annotated Named Entity (NE) in a bilingual corpus is a time-consuming, labor--intensive, and expensive process, but this is necessary for natural language processing (NLP) tasks such as cross-lingual information retrieval, cross-lingual information extraction, machine translation, etc. In this article, we present an automatic approach to construct an annotated NE in English-Vietnamese bilingual corpus from a bilingual parallel corpus by proposing an aligned NE method. Basing this corpus on a bilingual corpus in which the initial NEs are extracted from its own language separately, the approach tries to correct unrecognized NEs or incorrectly recognized NEs before aligning the NEs by using a variety of bilingual constraints. The generated corpus not only improves the NE recognition results but also creates alignments between English NEs and Vietnamese NEs, which are necessary for training NE translation models. The experimental results show that the approach outperforms the baseline methods effectively. In the English-Vietnamese NE alignment task, the F-measure increases from 68.58% to 79.77%. Thanks to the improvement of the NE recognition quality, the proposed method also increases significantly: the F-measure goes from 84.85% to 88.66% for the English side and from 75.71% to 85.55% for the Vietnamese side. By providing the additional semantic information for the machine translation systems, the BLEU score increases from 33.04% to 45.11%.

机译：手动创建一个注释命名实体(NE)双语语料库是一个耗时的,劳动密集型,和昂贵的过程,但这个自然语言处理是必要的吗(NLP)任务,比如跨语言信息检索、跨语言信息提取机器翻译等。提供了一个自动的方法来构造一个注释不English-Vietnamese双语从双语平行语料库的语料提出一个对齐方法。语料库在最初的双语语料库NEs就是从自己的语言中提取出来的另外,试图纠正的方法未被认出的NEs或错误地认识新经济学院通过使用各种调整NEs之前双语的约束。只有提高了但也不能识别结果创建英语NEs和之间的对齐越南NEs,必要的培训不翻译模型。表明,该方法优于基准有效方法。对齐任务,F-measure增加68.58%到79.77%之间。不能识别质量,该方法也会增加显著:F-measure英语,从84.85%到88.66%从75.71%到85.55%的越南。提供额外的语义信息机器翻译系统,蓝色的分数从33.04%增加到45.11%。

著录项

来源
《ACM transactions on Asian and low-resource language information processing 》 |2017年第2期| 9.1-9.17| 共17页
作者
Long H. B. Nguyen; Dien Dinh; Phuoc Tran;
展开▼
作者单位

University of Science, HCM City, Vietnam;

Ton Duc Thang University, HCM City, Vietnam;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类 TP-429;
关键词
Vietnamese; Nuclear Energy; NeonNatural Language ProcessingManual handlingNecessaryAlignmentinformation retrievalmachine translation systemRecombinant DNA;

机译：越南,核能;NeonNatural语言ProcessingManualhandlingNecessaryAlignmentinformation;

相似文献

外文文献
中文文献

1. A CONDITIONAL RANDOM FIELDS APPROACH TO BIOMEDICAL NAMED ENTITY RECOGNITION [J] . 电子科学学刊（英文版） . 2007 ,第006期
2. A CONDITIONAL RANDOM FIELDS APPROACH TO BIOMEDICAL NAMED ENTITY RECOGNITION [J] . Wang Haochang, Zhao Tiejun, Li Sheng, 电子科学学刊：英文版 . 2007 ,第006期
3. Dppa3 expression is critical for generation of fully reprogrammed iPS cells and maintenance of Dlk1-Dio3 imprinting [J] . Xingbo Xu, Lukasz Smorag, Toshinobu Nakamura, Nature Communications . 2015 ,第2016期

机译： Dppa3 表达对于生成完全重新编程的iPS细胞和维护 Dlk1 - Dio3 印记
4. Massive parallel sequencing uncovers actionable FGFR2–PPHLN1 fusion and ARAF mutations in intrahepatic cholangiocarcinoma [J] . Daniela Sia, Bojan Losic, Agrin Moeini, Nature Communications . 2015 ,第1期

机译：大规模并行测序发现可行的 FGFR2 – PPHLN1 融合和 <肝内胆管癌的named-entity> ARAF 突变
5. Human iPSC-derived motoneurons harbouring TARDBP or C9ORF72 ALS mutations are dysfunctional despite maintaining viability [J] . Anna-Claire Devlin, Karen Burr, Shyamanga Borooah, Nature Communications . 2015 ,第1期

机译：携带 TARDBP 或 C9ORF72 的人iPSC衍生的运动神经元尽管功能异常维持生存能力
6. Conditional random fields with dynamic potentials for Chinese named entity recognition. [O] . 2008

机译：Conditional random fields with dynamic potentials for Chinese named entity recognition.

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus

摘要

著录项

相似文献

相关主题

期刊订阅