Deep Learning Based on Hierarchical Self-Attention for Finance Distress Prediction Incorporating Text

Sumei Ruan; Xusheng Sun; Ruanxingchen YaoWei Li

摘要

To detect comprehensive clues and provide more accurate forecasting in the early stage of financial distress, in addition to financial indicators, digitalization of lengthy but indispensable textual disclosure, such as Management Discussion and Analysis (MDA), has been emphasized by researchers. However, most studies divide the long text into words and count words to treat the text as word count vectors, bringing massive invalid information but ignoring meaningful contexts. Aiming to efficiently represent the text of large size, an end-to-end neural networks model based on hierarchical self-attention is proposed in this study after the state-of-the-art pretrained model is introduced for text embedding including contexts. The proposed model has two notable characteristics. First, the hierarchical self-attention only affords the essential content with high weights in word-level and sentence-level and automatically neglects lots of information that has no business with risk prediction, which is suitable for extracting effective parts of the large-scale text. Second, after fine-tuning, the word embedding adapts the specific contexts of samples and conveys the original text expression more accurately without excessive manual operations. Experiments confirm that the addition of text improves the accuracy of financial distress forecasting and the proposed model outperforms benchmark models better at AUC and F2-score. For visualization, the elements in the weight matrix of hierarchical self-attention act as sealers to estimate the importance of each word and sentence. In this way, the "red-flag" statement that implies financial risk is figured out and highlighted in the original text, providing effective references for decision-makers.

机译：为了在财务困境的早期阶段发现全面的线索并提供更准确的预测，除了财务指标外，研究人员还强调将冗长但必不可少的文本披露（如管理层讨论和分析（MD&A））数字化。然而，大多数研究将长文本划分为单词并计算单词数，以将文本视为单词计数向量，带来大量无效信息，但忽略了有意义的上下文。为了高效表示大尺寸文本，该文在引入最先进的预训练模型后，提出了一种基于分层自注意力的端到端神经网络模型，用于文本嵌入（包括上下文）。所提出的模型有两个显著的特点。首先，分层自注意力只提供词级和句子级高权重的本质内容，自动忽略大量与风险预测无关的信息，适合于提取大尺度文本的有效部分。其次，经过微调后，词嵌入适配样本的具体上下文，更准确地传达原始文本表达，无需过多的人工操作。实验证实，文本的加入提高了财务困境预测的准确性，并且所提模型在AUC和F2得分方面优于基准模型。为了可视化，分层自注意力权重矩阵中的元素充当密封剂，以估计每个单词和句子的重要性。这样一来，隐含金融风险的“红旗”表述在原文中清晰可见，为决策者提供有效参考。

Deep Learning Based on Hierarchical Self-Attention for Finance Distress Prediction Incorporating Text

摘要

著录项

引文网络

相关主题

期刊订阅