首页> 外文会议>International Conference on Big Data and Information Analytics >Extracting Socio-Economic Indicators from Chinese Text with a BERT-based Model
【24h】

Extracting Socio-Economic Indicators from Chinese Text with a BERT-based Model

机译:用基于BERT的模型提取中文文本的社会经济指标

获取原文

摘要

Socio-economic indicators are powerful instruments for measuring economic conditions. Extracting them can help people grasp the economy trend and make decisions. Traditional machine learning methods for indicator extraction rely heavily on handcrafted features, which costs a large amount of human effort. While, deep learning methods can solve this problem but require a huge amount of labeled data, which is the trickiest challenge as the labeled data in indicator extraction task is quite rare. In this paper, we use a BERT-based model to deal with the challenges in this task. The model firstly represents input text with BERT, taking advantage of the strong ability of BERT to capture generic language features. Then, it fine-tunes the pre-trained model through the labeled data in our indicator extraction task to learn the specific features. Finally, they go through a conditional random field (CRF) layer to get the predicted tags across output token labels. In this way, our model does not require too much labeled data but it can automatically and sufficiently capture the language features of input text. Additionally, this paper also constructs a middle scale dataset for fine-tuning process and evaluates our model on it. The results demonstrate that the BERT-based model is superior to some strong baselines.
机译:社会经济指标是衡量经济条件的强大工具。提取它们可以帮助人们掌握经济趋势并做出决定。传统机器学习方法为指示器提取依赖于手工制作的功能,这花费了大量的人类努力。虽然,深度学习方法可以解决这个问题,但需要大量的标记数据,这是最糟糕的挑战,因为指示器提取任务中的标记数据非常罕见。在本文中,我们使用基于BERT的模型来处理这项任务中的挑战。该模型首先表示具有BERT的输入文本,利用BERT捕获通用语言特征的强大能力。然后,它通过在我们的指示器提取任务中通过标记的数据进行精细调整预先训练的模型,以了解具体功能。最后,他们通过条件随机字段(CRF)层来获取跨输出令牌标签的预测标记。通过这种方式,我们的模型不需要太多标记的数据,但它可以自动捕获输入文本的语言功能。此外,本文还构造了一个用于微调流程的中型数据集,并评估我们的模型。结果表明,基于BERT的模型优于一些强大的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号