【24h】

Analysis of the Degree of Importance of Information Using Newspapers and Questionnaires

机译:使用报纸和问卷调查信息的重要性程度

获取原文

摘要

Our objective is to estimate and clarify the factors that determine the degree of importance of information by extracting the words that characterize the degree of importance and to construct a system for automatically estimating this degree of importance. We studied the degree of importance of information by using machine learning. We first performed experiments using newspaper documents (Dn). In this experiment, we assumed that a document on the front page or at the top of the front page is important. We were able to identify important documents with a precision of 0.9 by using machine learning. We found that in the case of a newspaper, the degree of importance can be estimated with high precision. Next, to estimate the degree of importance that people attach to a document, we conducted experiments using questionnaire data (Dq) as test data. In these experiments, the subjects were asked to identify which document from a pair was more important, and a high accuracy of 94% was obtained with more than 80% of them responding with the same answer. Furthermore, on using newspaper documents (Dn) as training data, we could obtain (i) the same accuracy by using Dn only instead of using Dn with Dq and (ii) a higher accuracy on using Dn and Dq instead of using Dq only. This observation is useful because preparing questionnaire data (Dq) can be an expensive process, whereas (Dn) is free. Finally, we extracted the characteristic words that differentiated important information from less important information by calculating the parameters of the features in machine learning (maximum entropy (ME) method).
机译:我们的目标是估计和澄清通过提取表征重要程度的单词并构建自动估计这种重要性的系统来确定信息的重要性的因素。我们通过使用机器学习研究了信息的重要性程度。我们首先使用报纸文件(DN)进行实验。在这个实验中,我们假设在前页或首页顶部的文档很重要。我们能够通过机器学习确定精度为0.9的重要文件。我们发现在报纸的情况下,可以高精度地估算重要程度。接下来,为了估算人们附加到文件的重要性,我们使用调查问卷数据(DQ)作为测试数据进行实验。在这些实验中,要求受试者识别来自一对的文件更重要,并且在超过80%的80%以上响应相同的答案,获得了高精度的94%。此外,在使用报纸文档(DN)作为培训数据,我们可以通过使用DN获得(i)相同的准确性而不是使用DN和(ii)使用DN和DQ的更高的精度而不是使用DQ。这种观察是有用的,因为准备问卷数据(DQ)可以是昂贵的过程,而(DN)是免费的。最后,我们通过计算机器学习中的特征的参数来提取区分从不太重要的信息的特征词(最大熵(ME)方法)。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号