Training and Domain Adaptation for Supervised Text Segmentation

机译：培训和域适应监督案文分割

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unlike traditional unsupervised text segmentation methods, recent supervised segmentation models rely on Wikipedia as the source of large-scale segmentation supervision. These models have, however, predominantly been evaluated on the in-domain (Wikipedia-based) test sets, preventing conclusions about their general segmentation efficacy. In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain. To this end, we first introduce K12SEG, a new dataset for evaluation of supervised segmentation, created from educational reading material for grade-1 to college-level students. We then benchmark a hierarchical text segmentation model (HITS), based on RoBERTa, in both in-domain and domain-transfer segmentation experiments. While HITS produces state-of-the-art in-domain performance (on three Wikipedia-based test sets), we show that, subject to the standard fullblown fine-tuning, it is susceptible to domain overfitting. We identify adapter-based fine-tuning as a remedy that substantially improves transfer performance.

机译：与传统无监督的文本分割方法不同，最近监督分割模型依靠维基百科作为大规模分割监管的来源。然而，这些模型主要在域名（基于维基百科的）测试集上进行了评估，防止了其总结疗效的结论。在这项工作中，我们专注于教育领域监督神经文本细分的域转移性能。为此，我们首先介绍K12SeG，一个新数据集进行评估，用于评估监督分割，从教育阅读材料到大学学生创建。然后，我们基于域和域传输分割实验的基于Roberta基于Roberta来基准测试分层文本分段模型（HITS）。虽然点击产生最先进的域表现（基于三个维基百科的测试集），但我们认为，在标准的普通精细调整后，它易于域过度装备。我们将基于适配器的微调识别为基本上提高转移性能的补救措施。

著录项

来源
《Workshop on Innovative use of NLP for Building Educational Applications》|2021年|110-116|共7页
会议地点
作者
Goran Glavas; Ananya Ganesh; Swapna Somasundaran;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Weakly-Supervised Cross-Domain Adaptation for Endoscopic Lesions Segmentation [J] . Dong Jiahua, Cong Yang, Sun Gan, IEEE Transactions on Circuits and Systems for Video Technology . 2021,第5期

机译：内窥镜病变分割的弱监督跨域适应
2. Weakly-supervised domain adaptation for built-up region segmentation in aerial and satellite imagery [J] . Iqbal Javed, Ali Mohsen ISPRS Journal of Photogrammetry and Remote Sensing . 2020,第Sepa期

机译：空中和卫星图像中建筑区域分割的虚线监督域适应
3. Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes [J] . Wang Qi, Gao Junyu, Li Xuelong IEEE Transactions on Image Processing . 2019,第9期

机译：弱监督的对抗域自适应，用于城市场景中的语义分割
4. Bayesian Supervised Domain Adaptation for Short Text Similarity [C] . Md Arafat Sultan, Jordan Boyd-Graber, Tamara Sumner Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：短文本相似性的贝叶斯监督域自适应
5. Weakly-Supervised Semantic Segmentation in the Multi-Class Setting across Different Image Domains [D] . Chan, Lyndon Hin-Cheung. 2020

机译：不同图像域的多级环境中的弱监督语义细分
6. Supervised Domain Adaptation for Automatic Sub-cortical Brain Structure Segmentation with Minimal User Interaction [O] . Kaisar Kushibar, Sergi Valverde, Sandra González-Villà, -1

机译：有监督的域自适应可在用户交互最少的情况下自动进行皮质下大脑皮层结构分割
7. Weakly-supervised domain adaptation for built-up region segmentation in aerial and satellite imagery [O] . Javed Iqbal, Mohsen Ali 2020

机译：空中和卫星图像中建筑区域分割的虚线监督域适应

Training and Domain Adaptation for Supervised Text Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅