首页> 外文会议>International Conference on Computational Processing of the Portuguese Language >Inferring the Source of Official Texts: Can SVM Beat ULMFiT?
【24h】

Inferring the Source of Official Texts: Can SVM Beat ULMFiT?

机译:推论官方文本的来源:SVM可以击败ULMFiT吗?

获取原文

摘要

Official Gazettes are a rich source of relevant information to the public. Their careful examination may lead to the detection of frauds and irregularities that may prevent mismanagement of public funds. This paper presents a dataset composed of documents from the Official Gazette of the Federal District, containing both samples with document source annotation and unlabeled ones. We train, evaluate and compare a transfer learning based model that uses ULMFiT with traditional bag-of-words models that use SVM and Naive Bayes as classifiers. We find the SVM to be competitive, its performance being marginally worse than the ULMFiT while having much faster train and inference time and being less computationally expensive. Finally, we conduct ablation analysis to assess the performance impact of the ULMFiT parts.
机译:官方公报向公众提供了丰富的相关信息。他们的仔细检查可能导致发现欺诈和违规行为,从而防止公共资金管理不善。本文介绍了一个数据集,该数据集由来自联邦区《官方公报》的文档组成,既包含带有文档来源注释的样本,也包含未标记的样本。我们训练,评估和比较使用ULMFiT的基于迁移学习的模型与使用SVM和Naive Bayes作为分类器的传统单词袋模型。我们发现SVM具有竞争优势,其性能比ULMFiT稍差,同时具有更快的训练和推断时间以及更少的计算开销。最后,我们进行消融分析以评估ULMFiT零件对性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号