Where Is the Road for Issue Reports Classification Based on Text Mining?

机译：基于文本挖掘的发布报告分类之路在哪里？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed and structured data (e.g., priority, severity, software version and so on) are difficult to adopt. In this paper, issue classification approaches on a large-scale dataset, including 80 popular projects and over 252,000 issue reports collected from GitHub, were investigated. First, four traditional text-based classification methods and their performances were discussed. Semantic perplexity (i.e., an issues description confuses bug-related sentences with nonbug-related sentences) is a crucial factor that affects the classification performances based on quantitative and qualitative study. Finally, A two-stage classifier framework based on the novel metrics of semantic perplexity of issue reports was designed. Results show that our two-stage classification can significantly improve issue classification performances.

机译：当前，由于GitHub中问题跟踪系统（ITS）的高度开放性，开源项目每天都会收到各种问题。对于项目经理而言，ITS是一项非常耗时且费力的问题归类任务。但是，仅需要贡献者简短的文本摘要即可报告GitHub中的问题。因此，难以采用基于详细和结构化数据（例如，优先级，严重性，软件版本等）的大多数传统分类方法。在本文中，研究了大规模数据集上的问题分类方法，包括80个受欢迎的项目和从GitHub收集的超过252,000个问题报告。首先，讨论了四种传统的基于文本的分类方法及其性能。语义上的困惑（即问题描述使与错误相关的句子与与错误无关的句子混淆）是影响基于定量和定性研究的分类性能的关键因素。最后，设计了一个基于问题报告语义复杂度新指标的两阶段分类器框架。结果表明，我们的两阶段分类可以显着提高问题分类的性能。

著录项

来源
《ACM / IEEE International Symposium on Empirical Software Engineering and Measurement》|2017年|121-130|共10页
会议地点
作者
Qiang Fan; Yue Yu; Gang Yin; Tao Wang; Huaimin Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computer bugs; Software; Data mining; Feature extraction; Semantics; Measurement;

机译：计算机错误;软件;数据挖掘;特征提取;语义;测量;

相似文献

外文文献
中文文献
专利

1. Towards Auto-labelling Issue Reports for Pull-Based Software Development using Text Mining Approach [J] . Hassan Fazayeli, Sharifah Mashita Syed-Mohamad, Nur Shazwani Md Akhir Procedia Computer Science . 2019,第1期

机译：使用文本挖掘方法获取基于标签的软件开发的自动标记问题报告
2. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. [J] . Botsis T, Nguyen MD, Woo EJ, Journal of the American Medical Informatics Association : . 2011,第5期

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类。
3. Combining text mining and data mining for bug report classification [J] . Yu Zhou, Yanxiang Tong, Ruihang Gu, Journal of Software Maintenance and Evolution . 2016,第3期

机译：结合文本挖掘和数据挖掘进行错误报告分类
4. Where Is the Road for Issue Reports Classification Based on Text Mining? [C] . Qiang Fan, Yue Yu, Gang Yin, ACM/IEEE International Symposium on Empirical Software Engineering and Measurement . 2017

机译：基于文本挖掘的问题报告分类在哪里？
5. A semantic partition based text mining model for document classification. [D] . Inibhunu, Catherine. 2006

机译：用于文档分类的基于语义分区的文本挖掘模型。
6. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection [O] . Taxiarchis Botsis, Michael D Nguyen, Emily Jane Woo, 2011

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类
7. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation31This document was approved by the American College of Cardiology Board of Trustees in August 2001, the American Heart Association Science Advisory and Coordinating Committee in August 2001, and the European Society of Cardiology Board and Committee for Practice Guidelines and Policy Conferences in August 2001.32When citing this document, the American College of Cardiology, the American Heart Association, and the European Society of Cardiology would appreciate the following citation format: Fuster V, Rydén LE, Asinger RW, Cannom DS, Crijns HJ, Frye RL, Halperin JL, Kay GN, Klein WW, Lévy S, McNamara RL, Prystowsky EN, Wann LS, Wyse DG. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation). J Am Coll Cardiol 2001;38:XX-XX.33This document is available on the World Wide Web sites of the American College of Cardiology (www.acc.org), the American Heart Association (www.americanheart.org), the European Society of Cardiology (www.escardio.org), and the North American Society of Pacing and Electrophysiology (www.naspe.org). Single reprints of this document (the complete Guidelines) to be published in the mid-October issue of the European Heart Journal are available by calling +44.207.424.4200 or +44.207.424.4389, faxing +44.207.424.4433, or writing Harcourt Publishers Ltd, European Heart Journal, ESC Guidelines – Reprints, 32 Jamestown Road, London, NW1 7BY, United Kingdom. Single reprints of the shorter version (Executive Summary and Summary of Recommendations) published in the October issue of the Journal of the American College of Cardiology and the October issue of Circulation, are available for $5.00 each by calling 800-253-4636 (US only) or by writing the Resource Center, American College of Cardiology, 9111 Old Georgetown Road, Bethesda, Maryland 20814. To purchase bulk reprints specify version and reprint number (Executive Summary 71-0208; full text 71-0209) up to 999 copies, call 800-611-6083 (US only) or fax 413-665-2671; 1000 or more copies, call 214-706-1466, fax 214-691-6342; or E-mail: pubauth@heart.org. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation) Developed in Collaboration With the North American Society of Pacing and Electrophysiology [O] . Fuster Valentin, Rydén Lars E., Asinger Richard W., 2001

机译：ACC / AHA / ESC治疗房颤患者指南31该文件于2001年8月获得美国心脏病学会董事会，2001年8月美国心脏协会科学咨询与协调委员会以及欧洲心脏病学会的批准以及实践指南和政策委员会会议（2001年8月）。32引用本文件时，美国心脏病学会，美国心脏协会和欧洲心脏病学会将赞赏以下引用格式：Fuster V，RydénLE，Asinger RW，Cannom DS，Crijns HJ，Frye RL，Halperin JL，Kay GN，Klein WW，LévyS，McNamara RL，Prystowsky EN，Wann LS，Wyse DG。 ACC / AHA / ESC治疗房颤患者的指南：美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南委员会和政策会议的报告（制定指南委员会）用于房颤患者的治疗）。 J Am Coll Cardiol 2001; 38：XX-XX.33本文件可在美国心脏病学会（www.acc.org），美国心脏协会（www.americanheart.org），欧洲的万维网站点上找到心脏病学会（www.escardio.org）和北美起搏和电生理学会（www.naspe.org）。可致电+44.207.424.4200或+44.207.424.4389，传真+44.207.424.4433或写信给Harcourt Publishers，以获取本文档（完整的准则）的单份重印本（完整的准则），该印刷本将于10月中旬出版。欧洲心脏杂志，ESC指南–转载，英国伦敦詹姆斯敦路32号，NW1 7BY。短版（执行摘要和建议摘要）的单版重印在《美国心脏病学会杂志》十月刊和《循环》十月刊上，致电800-253-4636（仅美国），每本售价5.00美元。）或写信给美国心脏病学院资源中心，地址是：马里兰州贝塞斯达市Old Georgetown Road 9111，邮编20814。要购买批量转载，请指定版本和转载编号（执行摘要71-0208；全文71-0209），最多999份，致电800-611-6083（仅限美国）或传真413-665-2671； 1000或更多副本，请致电214-706-1466，传真214-691-6342;或电子邮件：pubauth@heart.org。美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南和政策会议（制定房颤患者治疗指南委员会）的报告是与北方合作开发的美国起搏与电生理学会

Where Is the Road for Issue Reports Classification Based on Text Mining?

摘要

著录项

相似文献

相关主题

期刊订阅