首页> 外文会议>2018 IEEE/ACM 40th International Conference on Software Engineering >Journal First Sentiment Polarity Detection for Software Development
【24h】

Journal First Sentiment Polarity Detection for Software Development

机译:期刊第一用于软件开发的情感极性检测

获取原文
获取原文并翻译 | 示例

摘要

The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within software repositories and information sources. With a few notable exceptions, empirical software engineering studies have exploited off-the-shelf sentiment analysis tools. However, such tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. In particular, Jongeling et al. show how the choice of the sentiment analysis tool may impact the conclusion validity of empirical studies because not only these tools do not agree with human annotation of developers' communication channels, but they also disagree among themselves. Our goal is to move beyond the limitations of off-the-shelf sentiment analysis tools when applied in the software engineering domain. Accordingly, we present Senti4SD, a sentiment polarity classifier for software developers' communication channels. Senti4SD exploits a suite of lexicon-based, keyword-based, and semantic features for appropriately dealing with the domain-dependent use of a lexicon. We built a Distributional Semantic Model (DSM) to derive the semantic features exploited by Senti4SD. Specifically, we ran word2vec on a collection of over 20 million documents from Stack Overflow, thus obtaining word vectors that are representative of developers' communication style. The classifier is trained and validated using a gold standard of 4,423 Stack Overflow posts, including questions, answers, and comments, which were manually annotated for sentiment polarity. We release the full lab package, which includes both the gold standard and the emotion annotation guidelines, to ease the execution of replications as well as new studies on emotion awareness in software engineering. To inform future research on word embedding for text categorization and information retrieval in software engineering, the replication kit also includes the DSM. Results. The contribution of the lexicon-based, keyword-based, and semantic features is assessed by our empirical evaluation leveraging different feature settings. With respect to SentiStrength, a mainstream off-the-shelf tool that we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. Furthermore, we provide empirical evidence of better performance also in presence of a minimal set of training documents.
机译:通过在软件存储库和信息源中挖掘人群生成的内容来研究软件开发人员的情绪的情感分析的作用正在日益显现。除了一些值得注意的例外,经验软件工程研究已经利用了现成的情绪分析工具。但是,此类工具已经在非技术领域和通用社交媒体上进行了培训,从而导致技术术语和问题报告的分类错误。特别是,Jongeling等。说明情绪分析工具的选择如何影响实证研究的结论有效性,因为这些工具不仅与开发人员的沟通渠道的人工注释不一致,而且它们之间也存在分歧。当在软件工程领域中应用时,我们的目标是超越现成的情绪分析工具的限制。因此,我们提出了Senti4SD,这是用于软件开发人员的通信渠道的情感极性分类器。 Senti4SD利用一套基于词典,基于关键字和语义的功能来适当地处理与域相关的词典使用。我们建立了分布式语义模型(DSM)来导出Senti4SD利用的语义特征。具体来说,我们对Stack Overflow上超过2000万个文档的集合运行word2vec,从而获得了代表开发人员通信风格的词向量。分类器使用4,423个Stack Overflow帖子(包括问题,答案和评论)的黄金标准进行了培训和验证,这些帖子均针对情感极性进行了手动注释。我们发布了完整的实验包,其中包括黄金标准和情感注释准则,以简化复制的执行以及软件工程中有关情感意识的新研究。为了为软件工程中的文本分类和信息检索中的单词嵌入提供未来的研究信息,该复制工具包还包括DSM。结果。基于词典,基于关键字和语义特征的贡献通过我们利用不同特征设置的经验评估来评估。关于SentiStrength(我们用作基准的主流现成工具),Senti4SD减少了中立和积极职位被归类为情绪消极的错误分类。此外,在最少的培训文档集的情况下,我们还提供了性能更好的经验证据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号