首页> 外文会议>Tenth workshop on building and using comparable corpora 2017 >Toward a Comparable Corpus of Latvian, Russian and English Tweets
【24h】

Toward a Comparable Corpus of Latvian, Russian and English Tweets

机译:拉脱维亚语,俄语和英语推文的可比语料库

获取原文
获取原文并翻译 | 示例

摘要

Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that it is feasible to build such a resource by collecting and analysing a pilot corpus, which is made publicly available and can be used to construct a large comparable corpus.
机译:Twitter已成为语言数据的丰富来源。在这里,研究了从拉脱维亚里加建立三语拉脱维亚语-俄语-英语推文语料库的可能性。这样的语料库一旦构建,就可能有多种用途,包括训练机器翻译模型,检查跨语言现象和研究里加人口。这项先导研究表明,通过收集和分析先导语料库来构建这样的资源是可行的,该语料库是公开可用的,可用于构建较大的可比较语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号