首页> 外文会议>IEEE International Conference on Software Maintenance and Evolution >Do Contexts Help in Phrase-Based, Statistical Source Code Migration?
【24h】

Do Contexts Help in Phrase-Based, Statistical Source Code Migration?

机译:在基于短语的统计源代码迁移中做上下文帮助吗?

获取原文

摘要

Prior research showed that to migrate Java code to C# by directly applying phrase-based statistical machine translation (SMT) on the lexemes of source code produces much semantically incorrect code. In this work, we conduct empirical studies on several open-source projects to investigate the use of well-defined semantics in programming languages to guide the translation process in SMT. We have investigated five types of features forming the contexts involving the (semantic) relations among code tokens including occurrence association among code tokens, data and control dependencies among program entities, visibility constraints of entities, and the consistency in declarations and accesses of variables, fields and methods. We use the Direct Maximum Entropy (DME) approach for feature integration. Our empirical results show that as individual features added to the baseline SMT model, token association and data dependencies contribute much with highest relative improvement in semantic correctness of up to 18.3% and 18.5%, respectively. The integration of three feature types (token association, data dependencies, and visibility) into the baseline model has highest relative improvement with up to 26.4% improvement in semantic correctness. Generally, 43.5-80.7% of the total translated methods are semantically correct. Our results show a good direction of using SMT with semantic features at different levels of abstraction to improve its accuracy.
机译:先前的研究表明,通过直接应用基于短语的统计机器翻译(SMT)在源代码的Lexemes上直接应用基于短语的统计机器翻译(SMT),将Java代码迁移到C#。在这项工作中,我们对几个开源项目进行了实证研究,以调查在编程语言中使用明确的语义来指导SMT中的翻译过程。我们已经调查了五种类型的特征,形成了涉及代码令牌之间的(语义)关系的上下文,包括代码令牌,数据和控制依赖性之间的发生关联,实体的可见性约束以及变量的声明和访问的一致性和方法。我们使用直接最大熵(DME)方法进行功能集成。我们的经验结果表明,随着添加到基线SMT模型的单个功能,令牌协会和数据依赖性分别为语义正确性的最高相对改善程度高达18.3%和18.5%。将三种特征类型(令牌关联,数据依赖性和可见性的集成到基线模型中具有最高的相对改善,并且对语义正确性的提高高达26.4%。通常,总转化方法的43.5-80.7%是语义正确的。我们的结果显示使用SMT与语义特征的良好方向,不同级别的抽象,以提高其准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号