首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Trouble information extraction based on a bootstrap approach from Twitter
【24h】

Trouble information extraction based on a bootstrap approach from Twitter

机译:基于Twitter的引导方法的故障信息提取

获取原文

摘要

In this paper, we propose a method for extracting trouble information from Twitter. One useful approach is based on machine learning techniques such as SVMs. However, trouble information is a fraction of a percent of all tweets on Twitter. In general, imbalanced distribution is not suitable for machine learning techniques to generate a classifier. Another approach is to extract trouble information by using handwritten rules. However, constructing high coverage rules by handwork is costly. First, we verify these problems in a preliminary experiment. Then, to solve these problems, we apply a bootstrapping method to our trouble information extraction task. We introduce three characteristics and a scoring method to the bootstrapping. As a result, the iteration process on the bootstrapping increased the number of tweets and patterns for trouble information dramatically.
机译:在本文中,我们提出了一种从Twitter提取故障信息的方法。一种有用的方法是基于诸如SVM的机器学习技术。但是,故障信息仅占Twitter所有推文的百分之一。通常,不平衡分布不适合于机器学习技术来生成分类器。另一种方法是通过使用手写规则提取故障信息。但是,通过手工构建高覆盖率规则的成本很高。首先,我们在初步实验中验证了这些问题。然后,为了解决这些问题,我们将自举方法应用于故障信息提取任务。我们向引导程序介绍了三个特征和一种计分方法。结果,引导程序上的迭代过程大大增加了有关故障信息的推文和模式的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号