首页> 外文会议>Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on >A comparison of part of speech taggers in the task of changing to anew domain
【24h】

A comparison of part of speech taggers in the task of changing to anew domain

机译:比较部分语音标记器在更改为语音任务中的任务新域名

获取原文

摘要

Part-of-speech tagging in real-world applications is performed ontext in domains which are different from the publicly available largetraining data sets. The two most successful part-of-speech taggers aretrained on the Wall Street Journal corpus, a corpus of millions ofwords. We compare their performance on a test set from a differentdomain-astronomy-from documents that are available on the World WideWeb. The Maximum Entropy Part of Speech Tagger (MXPOST) and theTransformation-Based Learning Tagger are well-known and widely used inlanguage research and development systems. The two taggers were testedin several modes: (1) after training on the Wall Street Journal corpusonly, (2) after training on only a small body of text from our astronomydomain, (3) with and without an auxiliary lexicon derived from manyastronomy-related Web documents, and (4) after incremental training-thatis, having been trained on the Wall Street Journal, with additionaltraining from the specific domain. One conclusion from the experiment isthat different taggers exhibit different biases when trained on the samedata
机译:实际应用中的词性标记是在 域中的文本与公开的大文本不同 训练数据集。两种最成功的词性标注器是 在《华尔街日报》语料库上接受培训,该语料库是数以百万计的 字。我们将它们在不同测试集上的性能进行比较 领域天文学-来自全球范围内可用的文档 网络。语音标注器的最大熵部分(MXPOST)和 基于转换的学习标记器是众所周知的,并在 语言研究和开发系统。两种标记器均经过测试 有以下几种模式:(1)在接受《华尔街日报》语料库培训后 仅(2)在仅训练了来自天文学的一小段文字之后 域,(3)有或没有从许多派生的辅助词典 与天文学有关的Web文档,以及(4)经过逐步培训后, 是,接受过《华尔街日报》的培训,另外还有 从特定领域进行培训。实验得出的一个结论是 在相同的训练下,不同的标记者表现出不同的偏见 数据

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号