首页> 外文会议>Workshop on Scholarly Document Processing >Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction
【24h】

Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction

机译:结构标签改善学术文档质量预测的文本分类

获取原文

摘要

Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction. The proposed system is applied to the task of accept/reject prediction on the Peer-Read dataset and compared against a recent BiLSTM-based model and joint textual+visual model as well as against plain HANs. Compared to plain HANs, accuracy increases on all three domains. On the computation and language domain our new model works best overall, and increases accuracy 4.7% over the best literature result. We also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S20RC dataset. For our HAN-system with structure-tags we reach 28.5% explained variance, an improvement of 1.8% over our reim-plementation of the BiLSTM-based model as well as 1.0% improvement over plain HANs.
机译:在长期培训经常性神经网络,特别是学术文件,导致学习问题。虽然分层关注网络(HANS)有效地解决了这些问题,但它们仍然失去了关于文本结构的重要信息。为了解决这些问题,我们建议使用HANS结合结构标签,这些标签标志着文档中的句子的作用。将标签添加到句子中,标记为对应于标题,摘要或主体文本,从最先进的学术文档质量预测产生改进。所提出的系统应用于对等读数据集的接受/拒绝预测的任务,并与最近的基于Bilstm的模型和联合文本+视觉模型以及普通汉斯进行比较。与普通汉斯相比,所有三个领域的准确性都会增加。在计算和语言领域,我们的新模型整体最佳,并在最好的文学结果中提高了4.7%的准确性。在引入标签时,我们还获得改进,以预测从Allen AI S20RC数据集编制的88K科学出版物的引用数量。对于我们具有结构标签的汉语系统,我们达到28.5%的差异,在我们对基于Bilstm的模型的恢复措施中的提高1.8%,与普通汉斯的改进1.0%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号