首页> 美国卫生研究院文献>BMC Bioinformatics >Training text chunkers on a silver standard corpus: can silver replace gold?
【2h】

Training text chunkers on a silver standard corpus: can silver replace gold?

机译:在银标准语料库上训练文本分块器:银可以代替金吗?

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundTo train chunkers in recognizing noun phrases and verb phrases in biomedical text, an annotated corpus is required. The creation of gold standard corpora (GSCs), however, is expensive and time-consuming. GSCs therefore tend to be small and to focus on specific subdomains, which limits their usefulness. We investigated the use of a silver standard corpus (SSC) that is automatically generated by combining the outputs of multiple chunking systems. We explored two use scenarios: one in which chunkers are trained on an SSC in a new domain for which a GSC is not available, and one in which chunkers are trained on an available, although small GSC but supplemented with an SSC.
机译:背景技术为了训练分词者识别生物医学文本中的名词短语和动词短语,需要带注释的语料库。但是,创建黄金标准语料库(GSC)既昂贵又费时。因此,GSC往往很小,并且只专注于特定的子域,这限制了它们的实用性。我们研究了银标准语料库(SSC)的使用,该标准语料库是通过组合多个分块系统的输出自动生成的。我们探索了两种使用方案:一种是在新的域中对SSC训练分块器,而GSC尚不可用;而另一种情况是,对块化器进行训练,虽然它是小型GSC,但补充了SSC。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号