首页> 外文期刊>Journal of information and computational science >A Pilot Study of the Characterization of English-Chinese Web Bilingual Data
【24h】

A Pilot Study of the Characterization of English-Chinese Web Bilingual Data

机译:英汉网络双语数据表征的初步研究

获取原文
获取原文并翻译 | 示例

摘要

With the rapid development of WWW, bilingual data appears on blogs, forums, etc. To the best of our knowledge, little work has been done to intensively study the characterization of bilingual data in English-Chinese mixed language pages. However, its distribution features do matters a lot for many text mining research topics, e.g. estimating training data quantity and quality for statistical machine translation. In this paper, we state several key issues to understand the characterizations of bilingual corpora on the web, and then we build an experimental platform to study the features of web bilingual data. Finally, we conduct the experiments and present the preliminary results.
机译:随着WWW的飞速发展,双语数据出现在博客,论坛等上。据我们所知,很少进行大量工作来深入研究英汉混合语言页面中的双语数据特征。但是,它的分布功能对于许多文本挖掘研究主题(例如,估计训练数据的数量和质量,以进行统计机器翻译。在本文中,我们提出了一些关键问题,以了解网络上双语语料库的特征,然后我们建立了一个实验平台来研究网络双语数据的特征。最后,我们进行实验并给出初步结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号