Extraction of Indonesian and english parallel sentences from movie subtitles

机译：从电影字幕中提取印尼语和英语平行句子

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Parallel corpus serves as a mandatory resource to develop machine-learning-based statistical translation engine. The size and coverage of parallel corpus available for training affects directly the translation accuracy of the engine. To have more training data available for the development of the translation engine in conversational domain, we propose a method to extract parallel data from Movie Subtitles using dynamic time warping, cosine similarity and beam search algorithm. The proposed method is capable of extracting 30% parallel sentences from a set of Indonesian-English movie subtitles with a precision of 98%.

机译：并行语料库是开发基于机器学习的统计翻译引擎的必需资源。可用于训练的并行语料库的大小和覆盖范围直接影响引擎的翻译准确性。为了在会话领域为翻译引擎的开发提供更多的培训数据，我们提出了一种使用动态时间规整，余弦相似度和波束搜索算法从电影字幕中提取并行数据的方法。所提出的方法能够从一组印尼英语电影字幕中提取30％的平行句子，精度为98％。

著录项

来源
《International conference on Asian language processing》|2017年|298-301|共4页
会议地点 Singapore(SG)
作者
Boon Hong Yeo; Ai Ti Aw; Xuancong Wang;
展开▼
作者单位

Human Language Technology Department Institute of Infocomm Research I2R Singapore;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Motion pictures; Engines; Data mining; Web pages; Heuristic algorithms; Dictionaries; Internet;

机译：动作照片;引擎;数据挖掘;网页;启发式算法；字典；互联网;

相似文献

外文文献
中文文献
专利

1. An Evaluation on Performance different metrics on extraction of Persian-English Parallel sentences [J] . Amin Keshavarzi, Marziyeh Homayouni International journal of computer science and network security . 2016,第7期

机译：波斯英语平行句子提取中不同性能指标的评价
2. Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair [J] . Deepa Gupta, Vani Raveendran, Rahul Kumar Yadav Research journal of applied science, engineering and technology . 2014,第6期

机译：英文-印地语对的域偏向双语并行数据提取及其句级对齐
3. Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair [J] . Deepa Gupta, Vani Raveendran, Rahul Kumar Yadav Research journal of applied science, engineering and technology . 2014,第6期

机译：英印对语域的偏倚双语并行数据提取及其句级对齐
4. Extraction of Indonesian and english parallel sentences from movie subtitles [C] . Boon Hong Yeo, Ai Ti Aw, Xuancong Wang International Conference on Asian Language Processing . 2017

机译：从电影字幕中提取印度尼西亚和英国平行句子
5. A sociolinguistic and communicative analysis of the translation processes in making subtitles for the movies: A case study of a Japanese movie "Ikiru (To Live)" with English subtitles [D] . Koike, Satoshi. 1989

机译：电影字幕制作过程中的翻译过程的社会语言学和交际分析：以日本电影“ Ikiru（To Live）”为英语字幕的个案研究
6. Subtitling strategies of swear words and taboo expressions in the movie Training Day [O] . Noureldin Mohamed Abdelaal, Amani Al Sarhani 2021

机译：在电影培训日中的发誓单词和禁忌表达的策略
7. SLANG EXPRESSIONS IN THE ENGLISH CLUELESS MOVIE TEXT AND THEIR SUBTITLING STRATEGIES IN THE BAHASA INDONESIA SUBTITLING TEXT [O] . Evrilia Timur Fitri 2016

机译：英语无意义电影文本中的sLaNG表达及其在英语字幕文本中的字幕策略

Extraction of Indonesian and english parallel sentences from movie subtitles

摘要

著录项

相似文献

相关主题

期刊订阅