首页> 外文会议>Conference on empirical methods in natural language processing;EMNLP 2011 >Non-parametric Bayesian Segmentation of Japanese Noun Phrases
【24h】

Non-parametric Bayesian Segmentation of Japanese Noun Phrases

机译:日语短语的非参数贝叶斯分割

获取原文

摘要

A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological analyzer.
机译:日语高质量词分割的关键因素是高覆盖率的字典,但是手动构建这样的词汇资源成本很高。尽管面向人类读者的外部词汇资源是潜在的良好知识资源,但由于细分标准的差异,因此尚未使用它们。为了用这些资源补充形态词典,我们提出了日语名词短语分割的新任务。我们使用非参数贝叶斯语言模型根据其在文本中假定成分的统计行为对这些资源中的每个名词短语进行分段。为了进行推断,我们提出了一种新的基于混合类型的抽样方法,即基于混合类型的抽样方法,该方法能够直接避开距离全局最优值不太远的局部最优值。实验表明,该方法有效地校正了形态分析仪给出的初始分割。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号