首页> 外文会议>International Joint Conference on Natural Language Processing(IJCNLP 2004); 20040322-24; Hainan Island(CN) >Korean Stochastic Word-Spacing with Dynamic Expansion of Candidate Words List
【24h】

Korean Stochastic Word-Spacing with Dynamic Expansion of Candidate Words List

机译:动态扩展候选单词列表的韩语随机单词间距

获取原文
获取原文并翻译 | 示例

摘要

The main aim of this work is to implement stochastic Korean Word-Spacing System which is equally robust for both inner-data and external-data. Word-spacing in Korean is influential in deciding semantic and syntactic scope. In order to cope with various problem yielded by word-spacing errors while processing Korean text, this study (a) presents a simple stochastic word-spacing system with only two parameters using relative word-unigram frequencies and odds favoring the inner-spacing probability of disyllables located at the boundary of stochastic-based words; (b) endeavors to diminish training-data-dependency by dynamically creating candidate words list with the longest-radix-selecting algorithm and (c) removes noise from the training-data by refining training procedure. The system thus becomes robust against unseen words and offers similar performance for both inner-data and external-data: it obtained 98.35% and 97.47% precision in word-unit correction from the inner test-data and the external test-data, respectively.
机译:这项工作的主要目的是实现对内部数据和外部数据都同样强大的随机朝鲜语单词间距系统。韩语中的词间距在确定语义和句法范围方面具有影响力。为了解决在处理朝鲜语文本时单词间距错误引起的各种问题,本研究(a)提出了一种简单的随机单词间距系统,该系统仅使用两个参数使用相对的单词字母组合词频率,而赔率则有利于内部间距的概率位于基于随机单词边界的双音节; (b)尝试通过使用最长基数选择算法动态创建候选单词列表来减少对训练数据的依赖性,以及(c)通过完善训练过程从训练数据中消除噪声。因此,该系统对于看不见的单词变得更加强大,并且为内部数据和外部数据提供了类似的性能:它分别从内部测试数据和外部测试数据中获得了98.35%和97.47%的字单位校正精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号