首页> 外文会议>IIAI International Conference on Advanced Applied Informatics >Development of a Website to Collect and Provide Questions about Book Titles Posted in Blogs and on Twitter
【24h】

Development of a Website to Collect and Provide Questions about Book Titles Posted in Blogs and on Twitter

机译:开发一个网站收集并提供有关在博客和推特上发布的图书标题的问题

获取原文

摘要

There are some people who post questions related to book titles in their blogs or on Twitter. If we develop a website that automatically collects such questions and asks for answers, other people who know the answers to these questions can respond efficiently. Hence, we have developed a method to semi-automatically collect questions from blogs and tweets, and we have built a website to display these questions. The proposed data collection method consists of two steps: (1) submission of words (to a search engine) that are characteristic to questions in order to obtain blog articles and tweets that are likely to contain questions, and (2) the use of automatic text classification to extract articles and tweets containing the questions. Through step (1), we extract characteristic words from 400 articles and tweets. In step (2), we adopt four classification methods (support vector machine (SVM), Naive Bayes, decision tree, and boosting) and compare their effectiveness by using 1,900 articles and tweets. It is found that (1) the characteristic words gtaitoru-ga-omoidase-naih produce the best precision (16% for Google Blog Search and 13% for Twitter Search) and (2) boosting and decision tree methods produce the best classification for blogs and Twitter (their F values are 0.943 and 0.941, respectively). When we displayed 30 articles and 31 tweets containing questions on our website, six and five of them, respectively, received satisfactory answers.
机译:有些人会发布与博客或推特上的书籍标题相关的问题。如果我们开发一个自动收集此类问题并要求答案的网站,那么了解这些问题的答案的其他人可以有效地响应。因此,我们已经开发了一种半自动收集博客和推文的问题,我们建立了一个网站来显示这些问题。所提出的数据收集方法包括两个步骤:(1)提交对问题的特征的单词(到搜索引擎),以便获取可能包含问题的博客文章和推文,以及(2)使用自动的使用文本分类以提取包含问题的文章和推文。通过步骤(1),我们从400篇文章和推文中提取特征词。在步骤(2)中,我们采用四种分类方法(支持向量机(SVM),幼稚贝叶斯,决策树和提升),并通过使用1,900篇文章和推文来比较它们的有效性。发现(1)特征词Gtaitoru-Ga-Omoidase-Naih产生最好的精度(Google Blog搜索的16%,Twitter搜索的13%)和(2)升压和决策树方法为博客产生最佳分类Twitter(它们的f值分别为0.943和0.941)。当我们在我们的网站上显示30篇文章和31个包含问题的推文时,分别为六到五个,他们分别获得满意的答案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号