首页> 外文会议>22nd International Conference on Computational Linguistics >Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features
【24h】

Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features

机译:具有语法特征的聚类标记标签树的无监督归纳

获取原文
获取原文并翻译 | 示例

摘要

We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ 10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes.
机译:我们提出了一种无监督的标记解析树归纳算法。该算法分为三个阶段:包围,初始标记和标记聚类。使用无监督的增量解析器从原始文本中进行包围。初始标记使用旨在最小化语法描述长度的合并模型完成。最后,使用从最初标记的树中提取的句法特征将标签聚类为所需数量的标签。该算法在WSJ 10语料库上获得59%的标记f分数,而先前的工作中该数字为35%,并且在随机基准上的错误率大大降低。我们使用两种标签映射方法和两种标签集大小报告英语,德语和中文语料库的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号