【24h】

Isarn Dharma word segmentation

机译:Isarn Dharma分词

获取原文
获取外文期刊封面目录资料

摘要

This paper presents Isarn Dhama word segmentation based on the Isarn Dharma writing system and dictionary. In this study, input text is segmented into sequences of Isarn Dharma Character Clusters (IDCCs). Each IDCC represents a group of inseparable Isarn Dharma characters based on the Isarn Dharma writing system. The sequence of IDCCs will be considered as input in order to look for the most suitable segmentation word from the dictionary using the IDCC longest matching algorithm. Grouping rules were then used to group adjacent remaining IDCCs that do not match an Isarn word in the dictionary. In order to evaluate the efficiency of the proposed technique, Isarn literature, Jataka, legend and Buddha foretell were used as the testing data to test the proposed system; comparing with longest matching and a hybrid of the IDCC longest matching. The experiment results showed that the F-measures are 80.15%, 85.06% and 86.07% for the longest matching, the IDCC longest matching algorithm, and the proposed method, respectively.
机译:本文介绍了基于Isarn Dharma书写系统和字典的Isarn Dhama分词。在这项研究中,将输入文本分割为Isarn佛法字符簇(IDCC)的序列。每个IDCC代表基于Isarn Dharma书写系统的一组不可分离的Isarn Dharma字符。 IDCC的序列将被视为输入,以便使用IDCC最长匹配算法从字典中查找最合适的分割词。然后使用分组规则将与字典中的Isarn单词不匹配的其余其余IDCC进行分组。为了评估所提出技术的效率,以Isarn文献,Jataka,Legend和Buddha foretell作为测试数据来检验所提出的系统。与最长匹配进行比较以及IDCC最长匹配的混合。实验结果表明,最长匹配,IDCC最长匹配算法和所提出的方法的F值分别为80.15%,85.06%和86.07%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号