首页> 外文期刊>Procedia Computer Science >Dictionary-based Word Segmentation for Javanese
【24h】

Dictionary-based Word Segmentation for Javanese

机译:Javanese基于字典的分词

获取原文
           

摘要

Word segmentation is the first step to process language that written in non-Latin letters such as such as Javanese script. In this study, we report our work on word segmentation based on dictionary approach. In the first phase, we generate all possible segmented word series using a word dictionary. The correct word is selected based on the last character in a word, the last two characters in a word, the difference of two consecutive words, and the frequency of the word in the additional corpus. The experimental results show that identifying words using the frequency of words in the additional corpus yield the best accuracy that is 91.08%.
机译:分词是处理以非拉丁字母(例如Javanese脚本)编写的语言的第一步。在这项研究中,我们报告了基于字典方法的分词工作。在第一阶段,我们使用单词词典生成所有可能的分段单词系列。根据单词中的最后一个字符,单词中的最后两个字符,两个连续单词的差以及附加语料库中单词的出现频率来选择正确的单词。实验结果表明,利用附加语料库中的词频识别词的最佳准确性为91.08%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号