首页> 外文会议>Conference on empirical methods in natural language processing >Non-parametric Bayesian Segmentation of Japanese Noun Phrases
【24h】

Non-parametric Bayesian Segmentation of Japanese Noun Phrases

机译:日本名词短语的非参数贝叶斯分割

获取原文

摘要

A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological analyzer.
机译:日语高质量词分割的关键因素是一个高覆盖字典,但手动构建这种词汇资源是昂贵的。尽管人类读者的外部词汇资源是潜在的知识来源,但由于分割标准的差异,它们尚未使用它们。为了补充与这些资源的形态词典,我们提出了日本名词短语分割的新任务。我们应用非参数贝叶斯语言模型将这些资源中的每个名词短语分段为文本中所假设的成分的统计行为。出于推理,我们提出了一种名为基于混合类型的采样的新颖的块采样过程,其能够直接逃避从全局最佳的局部最佳的最佳终止。实验表明,该方法有效地校正了形态分析仪给出的初始分割。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号