【24h】

Building Statistical Language Models of code

机译:建立代码的统计语言模型

获取原文
获取原文并翻译 | 示例

摘要

We present the Source Code Statistical Language Model data analysis pattern. Statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
机译:我们介绍了源代码统计语言模型数据分析模式。统计语言模型已成为众多重要语言技术的支持工具。语音识别,机器翻译和文档摘要(仅举几例)都依赖于统计语言模型来将概率估计分配给自然语言或句子。在这种数据分析模式中,我们描述了在软件源文件上构建n-gram语言模型的过程。我们希望通过向经验软件工程界介绍多年来在自然语言研究中建立的最佳实践,统计语言模型可以成为SE研究人员用来探索新研究方向的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号