首页> 外文会议>Insternational Joint Conference on Natural Language Processing >Stochastic Word-Spacing System with Dynamic Increase of Word List
【24h】

Stochastic Word-Spacing System with Dynamic Increase of Word List

机译:随机词间距系统,具有Word List的动态增加

获取原文
获取外文期刊封面目录资料

摘要

The main aim of this work is to implement a stochastic Korean word-spacing system which is equally robust for both inner data and external data. Word-spacing in Korean is influential in determining semantic and syntactic scope. In order to cope with various problems of word-spacing, this study (a) presents a simple stochastic word-spacing system with only two parameters using the odds favoring the inner-spacing of a given syllable bigram as well as relative word frequencies, and (b) endeavors to (ⅰ) remove noise from the training data and (ⅱ) diminish training data-dependency by dynamically creating a candidate word with longest-radix-selecting algorithm. The system thus becomes robust against unseen words and offers a similar performance for both inner data and external data: it obtained a 98.35% and a 96.59% precision in word-unit correction for the inner test data and the external test data, respectively.
机译:这项工作的主要目的是实施一种随机韩国单词间距系统,对内部数据和外部数据同样稳健。 韩国的单词间距在确定语义和句法范围时有影响力。 为了应对单词间距的各种问题,本研究(A)呈现出一个简单的随机字距系统,只有两个参数,使用了有利于给定音节BIGRAM的内部间距以及相对字频率的内部间隔,以及 (b)努力(Ⅰ)通过动态创建具有最长基数选择算法的候选词来消除训练数据的噪声和(Ⅱ)递减训练数据依赖性。 因此,该系统变得坚固针对未经语言,并且为内部数据和外部数据提供了类似的性能:它分别获得了内部测试数据的单词单元校正和外部测试数据的98.35%和96.59%的精度。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号