首页> 美国政府科技报告 >Deducing Linguistic Structure from the Statistics of Large Corpora.
【24h】

Deducing Linguistic Structure from the Statistics of Large Corpora.

机译:从大型语料库统计推断语言结构。

获取原文

摘要

Within the last two years, approaches using both stochastic and symbolic techniques have proved adequate to deduce lexical ambiguity resolution rules with less than 3-4% error rate, when trained on moderate sized (500K word) corpora of English text (e.g. Church, 1988; Hindle, 1989). The success of these techniques suggests that much of the grammatical structure of language may be derived automatically through distributional analysis, an approach attempted and abandoned in the 1950s. We describe here two experiments to see how far purely distributional techniques can be pushed to automatically provide both a set of part of speech tags for English, and a grammatical analysis of free English text. We also discuss the state of a tagged NL corpus to aid such research (now amounting to 4 million words of hand-corrected part-of-speech tagging).

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号