首页> 外文会议>International Conference on Electrical Engineering and Informatics >Systemic Risk Document Classification on Indonesian News Articles using Deep Learning and Active Learning
【24h】

Systemic Risk Document Classification on Indonesian News Articles using Deep Learning and Active Learning

机译:使用深度学习和积极学习的印度尼西亚新闻文章的全身风险文件分类

获取原文
获取外文期刊封面目录资料

摘要

Indonesian online news articles are growing fastly in this decade. One of the information is about economic news, including the information on financial systemic risk. In order to get information on financial systemic risk in real time, the task on systemic risk document classification should be done automatically. Here, we employ deep learning and active learning to classify systemic risk document automatically. We use 15 classes of financial systemic risk, such as defined before by Bank of Indonesia. The task is a multi-label classification, where a text document may contain more than 1 information of systemic risk. For the deep learning strategy, we've conducted several experiments of CNN, Bi-LSTM and Bi-GRU. We've also compared it with two steps of classification. In the experimental result, using 1752 documents as the training data and 228 documents as the testing data, the highest F1 score was achieved by using Bi-LSTM topology with one classification step and large common corpus as the resource for the word embedding. The highest F1 score was 45.37% for 15 classes with probability threshold defined as 0.15. In the two steps of classification, the first classification for 2 classes (contain risk information or not), the accuracy was 82.46%. To handle the limited data, we've conducted active learning to select the next candidate to be labeled as training data. In the experiment, for 420 new data with each iteration of 20 new data, the results showed that using active learning couldn't improve the performance.
机译:印度尼西亚在线新闻文章在这十年中迅速增长。其中一个信息是关于经济新闻,包括有关金融全身风险的信息。为了实时获取有关金融系统风险的信息,应自动完成对系统风险文件分类的任务。在这里,我们采用深度学习和主动学习,自动对系统风险文件进行分类。我们使用15级金融全身风险,例如在印度尼西亚银行之前定义。该任务是一个多标签分类,其中文本文档可能包含超过1个系统风险的信息。对于深度学习策略,我们对CNN,Bi-LSTM和Bi-Gru进行了几个实验。我们还将其与分类的两个步骤进行了比较。在实验结果中,使用1752个文件作为训练数据和228个文件作为测试数据,通过使用一个分类步骤和大型常见语料库来实现最高F1分数作为嵌入单词的资源。对于15个阶级的最高F1分数为45.37%,概率阈值定义为0.15。在分类的两个步骤中,2个类的第一个分类(包含风险信息或不),准确性为82.46%。要处理有限的数据,我们进行了主动学习,以选择要标记为训练数据的下一个候选者。在实验中,对于420个新数据,每次迭代20个新数据,结果表明,使用主动学习无法提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号