Longest Matching and Rule-based Techniques for Khmer Word Segmentation

机译：最长匹配和基于规则的高棉语分词技术

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word boundaries are the essential assignment to be done in natural language processing research. In most Asian languages, as well as Khmer language, many studies involved with word segmentation have been investigated. In Khmer Word Segmentation, several approaches related to segmenting words based on dictionary have been studied. There are only few researches about solving unknown word problem. This matter is a quite challenge task in word separation. In this research, Maximum Matching algorithm (MMA) together with Rule-based technique has been proposed. First, MMA and a Khmer manual corpus were used to make word boundaries in each sentence. Then the unknown words were then defined and solved by using 21 grammar rules created. We tested the segmentation with 2018 sentences from agriculture, magazine, newspaper, technology, health and history. With Maximum Matching alone, we could achieve the accuracy of 88.55% and along with Rule-based, the accuracy increased to 92.81%.

机译：单词边界是自然语言处理研究中必不可少的任务。在大多数亚洲语言以及高棉语言中，已对许多涉及分词的研究进行了调查。在高棉语单词分割中，研究了与基于字典的单词分割有关的几种方法。解决未知单词问题的研究很少。在分词中，此问题是一项非常艰巨的任务。在这项研究中，提出了最大匹配算法（MMA）和基于规则的技术。首先，使用MMA和高棉手册语料库在每个句子中划分单词边界。然后使用创建的21个语法规则定义和解决未知单词。我们使用来自农业，杂志，报纸，技术，卫生和历史的2018年句子测试了细分。仅使用“最大匹配”，我们就可以达到88.55 \％的准确度，而基于“规则”的准确度则可以提高到92.81 \％。

著录项

来源
《International Conference on Knowledge and Smart Technology》|2018年|80-83|共4页
会议地点
作者
Pakrigna Long; Veera Boonjing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Dictionaries; Voltage control; Agriculture; History; Grammar; Natural language processing;

机译：字典;电压控制;农业;历史;语法;自然语言处理;

相似文献

外文文献
中文文献
专利

1. Image matching technique based on SURF descriptors for offline handwritten Arabic word segmentation [J] . Maamar Kef, Leila Chergui International Journal of Intelligent Systems Technologies and Applications . 2020,第3期

机译：基于冲浪描述符的图像匹配技术，用于离线手写阿拉伯语词分割
2. Segmentation versus non-segmentation based neural techniques for cursive word recognition: an experimental analysis [J] . Xiaolong Fan, Brijesh Verma International Journal of Computational Intelligence and Applications . 2002,第4期

机译：基于分段与非分段神经网络技术的草书单词识别：实验分析
3. Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques [J] . Mohamed M., Gader P. IEEE Transactions on Pattern Analysis and Machine Intelligence . 1996,第5期

机译：使用无分段隐马尔可夫建模和基于分段的动态编程技术的手写单词识别
4. Longest Matching and Rule-based Techniques for Khmer Word Segmentation [C] . Pakrigna Long, Veera Boonjing International Conference on Knowledge and Smart Technology . 2018

机译：基于匹配和规则的Khmer Word分段技术
5. Information retrieval for Khmer documents: Challenges and approaches to word segmentation. [D] . Tum, Phylypo. 2007

机译：高棉语文件的信息检索：分词的挑战和方法。
6. Spinal Cord Segmentation by One Dimensional Normalized Template Matching: A Novel Quantitative Technique to Analyze Advanced Magnetic Resonance Imaging Data [O] . Adam Cadotte, David W. Cadotte, Micha Livne, -1

机译：一维归一化模板匹配的脊髓分割：一种新颖的定量技术用于分析高级磁共振成像数据
7. Khmer Word Segmentation and Out-of-Vocabulary Words Detection Using Collocation Measurement of Repeated Characters Subsequences [O] . Van Channa, Kameyama Wataru 2013

机译：高棉单词分割和词汇外单词检测使用重复字符子序列的搭配测量

Longest Matching and Rule-based Techniques for Khmer Word Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅