Inferring the Source of Official Texts: Can SVM Beat ULMFiT?

机译：推论官方文本的来源：SVM可以击败ULMFiT吗？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Official Gazettes are a rich source of relevant information to the public. Their careful examination may lead to the detection of frauds and irregularities that may prevent mismanagement of public funds. This paper presents a dataset composed of documents from the Official Gazette of the Federal District, containing both samples with document source annotation and unlabeled ones. We train, evaluate and compare a transfer learning based model that uses ULMFiT with traditional bag-of-words models that use SVM and Naive Bayes as classifiers. We find the SVM to be competitive, its performance being marginally worse than the ULMFiT while having much faster train and inference time and being less computationally expensive. Finally, we conduct ablation analysis to assess the performance impact of the ULMFiT parts.

机译：官方公报向公众提供了丰富的相关信息。他们的仔细检查可能导致发现欺诈和违规行为，从而防止公共资金管理不善。本文介绍了一个数据集，该数据集由来自联邦区《官方公报》的文档组成，既包含带有文档来源注释的样本，也包含未标记的样本。我们训练，评估和比较使用ULMFiT的基于迁移学习的模型与使用SVM和Naive Bayes作为分类器的传统单词袋模型。我们发现SVM具有竞争优势，其性能比ULMFiT稍差，同时具有更快的训练和推断时间以及更少的计算开销。最后，我们进行消融分析以评估ULMFiT零件对性能的影响。

著录项

来源
《International Conference on Computational Processing of the Portuguese Language》|2020年|76-86|共11页
会议地点
作者
Pedro Henrique Luz de Araujo; Teofilo Emidio de Campos; Marcelo Magalhaes Silva de Sousa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Text classification; Language models; Transfer learning;

机译：文字分类;语言模型;转移学习;

相似文献

外文文献
中文文献
专利

1. SVMs Classification Based Two-side Cross Domain Collaborative Filtering by inferring intrinsic user and item features [J] . Yu Xu, Chu Yan, Jiang Feng, Knowledge-Based Systems . 2018,第feba1期

机译：通过推断用户和项目的固有特征，基于SVM分类的两边跨域协作过滤
2. Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management [J] . GarlaV., TaylorC., BrandtC. Journal of biomedical informatics. . 2013,第5期

机译：使用Laplacian SVM的半监督临床文本分类：在癌症病例管理中的应用
3. NAMED ENTITY RECOGNITION IN GREEK TEXTS WITH AN ENSEMBLE OF SVMS AND ACTIVE LEARNING [J] . GIORGIO LUCARELLI, XENOFON VASILAKOS, ION ANDROUTSOPOULOS International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2007,第6期

机译：具有SVMS和主动学习功能的希腊语文本中的命名实体识别
4. Text Classification Using Language Modeling: Reproducing ULMFiT [C] . Mohamed Abdellatif, Ahmed Elgammal International Conference on Language Resources and Evaluation . 2020

机译：使用语言建模的文本分类：再现ULMFIT
5. Optimisation de ressources pour la selection de modele des SVM. [D] . Adankon, Mathias Mahousonzou. 2005

机译：用于选择SVM模型的资源优化。
6. Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management [O] . Vijay Garla, Caroline Taylor, Cynthia Brandt -1

机译：使用Laplacian SVM的半监督临床文本分类：在癌症病例管理中的应用
7. Ceremonial and official: Letters and other congratulation texts of the educational establishments of the 18th century German-language culture area as exemplified by the Academic Gymnasia in Szczecin and Gdańsk (a selection of source materials with annotations, transcripts, and translation) [O] . Jacek Pokrzywnicki 2020

机译：仪式和官方：18世纪教育机构的信件和其他祝贺文本德语文化区域，如Szczecin和Gdańsk的学术健身房所示（包括注释，成绩单和翻译的源材料）
8. Modified Beat Frequency Modulated Accretion Model I. Spin Periods and MagneticMoments of Z-Sources Inferred from Horizontal Branch QPO [R] . Wood, K. S., Michelson, P. F., Roberts, M. S. 1996

机译：改进的节拍调频吸收模型I.从水平分支QpO推断的Z源的旋转周期和磁性元素

Inferring the Source of Official Texts: Can SVM Beat ULMFiT?

摘要

著录项

相似文献

相关主题

期刊订阅