首页> 外文期刊>Journal of seismology >A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018
【24h】

A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018

机译:前提加速地震性语料库的初步文本分类:1988年至2018年地震可预测性研究的一些理论趋势推断

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Text analytics based on supervised machine learning has shown great promise in a multitude of domains but has yet to be applied to seismology. We describe some common classifiers (Naive Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) as well as the standard steps of supervised learning (training, validation of model parameter adjustments, and testing). To illustrate text classification on a seismological corpus, we use a hundred articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled by Mignan [Tectonophysics, 2011] with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate how the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naive Bayes model performs best, in agreement with the machine learning literature for the case of small datasets, with cross-validation accuracies showing the model's predictive ability for both binary classification (critical process or else) and a multiclass classification (non-critical process, agnostic, critical process assumed, critical process demonstrated). Prediction on a dozen of articles published since 2011 shows however a weak generalization, which can be explained, in part, by the empirical variance of the small training set. This preliminary study demonstrates the potential of supervised learning to reveal textual patterns in the seismological literature. Manual labelling remains essential but is made transparent by an investigation of Naive Bayes keyword posterior probabilities.
机译:基于监督机器学习的文本分析在多种域中表现出很大的承诺,但尚未适用于地震学。我们描述了一些常见的分类器(天真贝叶斯,k最近邻居,支持向量机和随机林)以及监督学习的标准步骤(培训,模型参数调整和测试的验证)。为了说明对地震语料库上的文本分类,我们使用与1988年至2010年的前身加速地震性主题有关的一百个文章。该语料库由Mignan [Tectonophysics,2011]标有前兆,无论是由关键过程解释的吗(即,级联触发)或其他进程(例如主故障加载的签名)。我们调查分类过程如何自动化,以帮助分析更大的Corpora,以更好地了解地震可预测性研究的趋势。我们发现Naive Bayes Model在与小型数据集的情况下的机器学习文献一致地表现最佳,具有交叉验证精度,显示模型对二进制分类的预测能力(关键过程或其他)和多字符分类(非 - 假定的临界过程,不可知论,临界过程,证明关键过程)。自2011年以来发表的十几个文章的预测显示了较弱的泛化,可以部分地通过小型训练集的经验方差来解释。这项初步研究表明,监督学习的潜力,揭示地震文学中的文本模式。手动标签仍然是必不可少的,但是通过对朴素贝叶斯关键词后续概率的调查透明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号