首页> 美国政府科技报告 >Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
【24h】

Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora

机译:通过对齐语料库的鲁棒投影诱导多语言文本分析工具

获取原文

摘要

This paper describe system and set of automatically inducing stand- alone monolingual part-of-speech taggers, base noun-phrase bracketers, named- entity taggers and morphological analyzers for an arbitrary foreign language. Case studies include French, Chinese, Czech and Spanish. Existing text analysis tools for English are applied to bilingual text corpora and their output projected onto the second language via statistically derived word alignments. Simple direct annotation projection is quite noisy, however, even with optimal alignments. Thus this paper presents noise-robust tagger, bracketer and lemmatizer training procedures capable of accurate system bootstrapping form noisy and incomplete initial projections. Performance of the induced stand- alone part-of-speech tagger applied to French achieves 96% core part-of-speech (POS) tag accuracy, and the corresponding induced noun-phrase bracketer exceeds 91% lemmatization accuracy on the complete French verbal system. This achievement is particularly noteworthy in that it required absolutely no hand- annotated training data in the given language, and virtually no language- specific knowledge or resources beyond raw text. Performance also significantly exceeds that obtained by direct annotation projection.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号