...
首页> 外文期刊>PLoS One >Automated content analysis across six languages
【24h】

Automated content analysis across six languages

机译:六种语言自动化内容分析

获取原文
           

摘要

Corpus selection bias in international relations research presents an epistemological problem: How do we know what we know? Most social science research in the field of text analytics relies on English language corpora, biasing our ability to understand international phenomena. To address the issue of corpus selection bias, we introduce results that suggest that machine translation may be used to address non-English sources. We use human translation and machine translation (Google Translate) on a collection of aligned sentences from United Nations documents extracted from the Multi-UN corpus, analyzed with a “bag of words” analysis tool, Linguistic Inquiry Word Count (LIWC). Overall, the LIWC indices proved relatively stable across machine and human translated sentences. We find that while there are statistically significant differences between the original and translated documents, the effect sizes are relatively small, especially when looking at psychological processes.
机译:国际关系研究中的语料库选择偏见提出了一个认识论问题:我们如何知道我们所知道的? 文本分析领域的大多数社会科学研究依赖于英语语言,偏见我们理解国际现象的能力。 为了解决语料库选择偏差问题,我们介绍了结果,表明机器翻译可用于解决非英语来源。 我们使用人文翻译和机器翻译(谷歌翻译)在来自多联合国语料库中提取的联合国文件的一系列对齐句子上,用“袋单词”分析工具,语言查询字数(LIWC)分析。 总体而言,LIWC指数在机器和人类翻译句子上证明了相对稳定的。 我们发现,虽然原始和翻译文件之间存在统计学意义的差异,但效果大小相对较小,特别是在观察心理过程时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号