首页> 外文会议>International conference on applications of natural language to information systems >Identification of Multi-Focal Questions in Question and Answer Reports
【24h】

Identification of Multi-Focal Questions in Question and Answer Reports

机译:在问答报告中识别多焦点问题

获取原文

摘要

A significant amount of business and scientific data is collected via question and answer reports. However, these reports often suffer from various data quality issues. In many cases, questionnaires contain a number of questions that require multiple answers, which we argue can be a potential source of problems that may lead to poor-quality answers. This paper introduces multi-focal questions and proposes a model for identifying them. The model consists of three phases: question pre-processing, feature engineering and question classification. We use six types of features: lexical/surface features, Part-of-Speech, readability, question structure, wording and placement features, question response type and format features and question focus. A comparative study of three different machine learning algorithms (Bayes Net, Decision Tree and Support Vector Machine) is performed on a dataset of 150 questions obtained from the Carbon Disclosure Project, achieving the accuracy of 91%.
机译:通过问答报告收集了大量的商业和科学数据。但是,这些报告经常遭受各种数据质量问题的困扰。在许多情况下,问卷中包含许多需要多个答案的问题,我们认为这可能是可能导致答案质量低下的问题的潜在根源。本文介绍了多焦点问题,并提出了一个用于识别它们的模型。该模型包括三个阶段:问题预处理,特征工程和问题分类。我们使用六种类型的功能:词汇/表面功能,词性,可读性,问题结构,措辞和放置功能,问题响应类型和格式功能以及问题重点。对从Carbon Disclosure Project获得的150个问题的数据集进行了三种不同机器学习算法(贝叶斯网,决策树和支持向量机)的比较研究,其准确性达到91%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号