...
首页> 外文期刊>Procedia Computer Science >Syllabification Model of Indonesian Language Named-Entity Using Syntactic n-Gram
【24h】

Syllabification Model of Indonesian Language Named-Entity Using Syntactic n-Gram

机译:使用句法n-gram的印度尼西亚语言名称实体的音节模型

获取原文
           

摘要

Syllabication or syllabification is an activity to detect syllable boundaries in a word. There are two main ways for automatic syllabification, namely rule-based and data-driven. The rule-based approach is based on the general principle of syllabification, while the data-driven method uses a set of syllabified words to create a syllabification of unknown words. Research on syllabification of words has been done a lot. However, most of these studies only deal with the formal words but still a few studies for named entities. Besides, named entities tend to be more complicated than the regular words. In this research, a syntactic n-Gram is proposed and investigated to syllabify the named entities since it is developed based on the n-gram that has an excellent accuracy and tends to be consistent with various languages. Evaluation on 20 k named-entities based on 4-fold cross-validation show that the proposed model gives a competitive syllable error rate (SER) compare to another similar n-gram-based model.
机译:音节或音节是一个在一个单词中检测音节边界的活动。自动音节有两种主要方法,即基于规则和数据驱动。基于规则的方法是基于音节的一般原则,而数据驱动方法使用一组音节单词来创建未知单词的音节。关于单词的音节研究已经完成了很多。然而,这些研究中的大多数只处理了正式的话语,但仍然是一些针对命名实体的研究。此外,命名实体往往比常规词更复杂。在这项研究中,提出了一种句法n-gram并调查了Syllabify命名实体,因为它是基于具有优异精度的N-GR族的开发,并且往往与各种语言一致。基于4倍交叉验证的20 k命名实体的评估表明,该模型给出了与另一个类似的基于N-GRAM的模型相比的竞争音节错误率(SER)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号