首页> 外文期刊>Science of Computer Programming >Irish: A Hidden Markov Model to detect coded information islands in free text
【24h】

Irish: A Hidden Markov Model to detect coded information islands in free text

机译:爱尔兰语:一种隐马尔可夫模型,用于检测自由文本中的编码信息孤岛

获取原文
获取原文并翻译 | 示例
           

摘要

Developers' communication, as contained in emails, issue trackers, and forums, is a precious source of information to support the development process. For example, it can be used to capture knowledge about development practice or about a software project itself. Thus, extracting the content of developers' communication can be useful to support several software engineering tasks, such as program comprehension, source code analysis, and software analytics. However, automating the extraction process is challenging, due to the unstructured nature of free text, which mixes different coding languages (e.g., source code, stack dumps, and log traces) with natural language parts. We conduct an extensive evaluation of Irish (InfoRmation ISlands Hmm), an approach we proposed to extract islands of coded information from free text at token granularity, with respect to the state of art approaches based on island parsing or island parsing combined with machine learners. The evaluation considers a wide set of natural language documents (e.g., textbooks, forum discussions, and development emails) taken from different contexts and encompassing different coding languages. Results indicate an F-measure of Irish between 74% and 99%; this is in line with existing approaches which, differently from Irish, require specific expertise for the definition of regular expressions or grammars.
机译:电子邮件,问题跟踪器和论坛中包含的开发人员交流是支持开发过程的宝贵信息来源。例如,它可以用于捕获有关开发实践或软件项目本身的知识。因此,提取开发人员通信的内容对于支持多种软件工程任务(例如程序理解,源代码分析和软件分析)可能很有用。但是,由于自由文本的非结构化性质(将不同的编码语言(例如,源代码,堆栈转储和日志跟踪)与自然语言部分混合在一起),自动化提取过程具有挑战性。我们对爱尔兰语(InfoRmation ISlands Hmm)进行了广泛的评估,这是我们建议的一种方法,它基于令牌解析或结合机器学习者的岛屿解析的最新技术水平,以令牌粒度从自由文本中提取编码信息的岛屿。评估考虑了来自不同上下文并包含不同编码语言的各种自然语言文档(例如,教科书,论坛讨论和开发电子邮件)。结果表明爱尔兰的F测度介于74%和99%之间;这与现有的方法是一致的,与爱尔兰的方法不同,它需要特殊的专业知识来定义正则表达式或语法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号