首页> 外文期刊>Engineering and Applied Science Research >A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
【24h】

A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis

机译:使用Bilstm和基于规则分析的Pali Sandhi分割的混合方法

获取原文
       

摘要

Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that predicts splitting locations by classifying the sample Sandhi words into five classes with a bidirectional long short-term memory model. We applied the classified rules to rectify the words from the splitting locations. We identified 6,345 Pali Sandhi words from Dhammapada Atthakatha. We evaluated the performance of our proposed model on the basis of the accuracy of the splitting locations and compared the results with the dataset. Results showed that 92.20% of the splitting locations were correct, 1.10% of the Pali Sandhi words were predicted as non-splitting location words and 5.83% were not matched with the answers (incomplete segmentation).
机译:Pali Sandhi是从两个单词进入新词的语音转变。 更改并合并了邻近单词的音素。 Pali Sandhi Word分割比泰国字分割更具挑战性,因为Pali是一种高度变性的语言。 本研究提出了一种新的方法,其通过将样品Sandhi单词分为五类,通过双向短期内记忆模型将样本Sandhi单词分类为五类来预测分裂位置。 我们应用了分类规则来纠正拆分位置的单词。 我们确定了来自Dhammapada Atthakatha的6,345个Pali Sandhi单词。 我们根据分裂位置的准确性评估我们提出的模型的性能,并将结果与数据集进行比较。 结果表明,92.20%的分裂位置是正确的,1.10%的Pali Sandhi单词被预测为非分裂定位词,5.83%与答案(不完整的分割)不匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号