Longest Matching and Rule-based Techniques for Khmer Word Segmentation

机译：基于匹配和规则的Khmer Word分段技术

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Word boundaries are the essential assignment to be done in natural language processing research. In most Asian languages, as well as Khmer language, many studies involved with word segmentation have been investigated. In Khmer Word Segmentation, several approaches related to segmenting words based on dictionary have been studied. There are only few researches about solving unknown word problem. This matter is a quite challenge task in word separation. In this research, Maximum Matching algorithm (MMA) together with Rule-based technique has been proposed. First, MMA and a Khmer manual corpus were used to make word boundaries in each sentence. Then the unknown words were then defined and solved by using 21 grammar rules created. We tested the segmentation with 2018 sentences from agriculture, magazine, newspaper, technology, health and history. With Maximum Matching alone, we could achieve the accuracy of 88.55% and along with Rule-based, the accuracy increased to 92.81%.

机译：Word边界是在自然语言处理研究中进行的基本任务。在大多数亚洲语言以及高棉语言中，已经调查了许多与文字细分涉及的研究。在Khmer Word分割中，已经研究了与基于词典的分割词相关的几种方法。解决未知词问题只有很少的研究。这件事是单词分离中的一个非常挑战的任务。在该研究中，已经提出了最大匹配算法（MMA）以及基于规则的技术。首先，MMA和Khmer手册语料库用于在每个句子中进行单词边界。然后通过使用创建的21个语法规则来定义并解决未知单词。我们用2018年农业，杂志，报纸，技术，健康和历史的句子进行了对细分。最大匹配单独，我们可以达到88.55 ％的准确性，并且随着规则的准确性，准确性增加到92.81％。

著录项

来源
《International Conference on Knowledge and Smart Technology》|2018年|296p|共4页
会议地点
作者
Pakrigna Long; Veera Boonjing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
Dictionaries; Voltage control; Agriculture; History; Grammar; Natural language processing;

机译：词典;电压控制;农业;历史;语法;自然语言处理;

相似文献

外文文献
中文文献
专利

1. Image matching technique based on SURF descriptors for offline handwritten Arabic word segmentation [J] . Maamar Kef, Leila Chergui International Journal of Intelligent Systems Technologies and Applications . 2020,第3期

机译：基于冲浪描述符的图像匹配技术，用于离线手写阿拉伯语词分割
2. Segmentation versus non-segmentation based neural techniques for cursive word recognition: an experimental analysis [J] . Xiaolong Fan, Brijesh Verma International Journal of Computational Intelligence and Applications . 2002,第4期

机译：基于分段与非分段神经网络技术的草书单词识别：实验分析
3. Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques [J] . Mohamed M., Gader P. IEEE Transactions on Pattern Analysis and Machine Intelligence . 1996,第5期

机译：使用无分段隐马尔可夫建模和基于分段的动态编程技术的手写单词识别
4. Longest Matching and Rule-based Techniques for Khmer Word Segmentation [C] . Pakrigna Long, Veera Boonjing International Conference on Knowledge and Smart Technology . 2018

机译：最长匹配和基于规则的高棉语分词技术
5. Information retrieval for Khmer documents: Challenges and approaches to word segmentation. [D] . Tum, Phylypo. 2007

机译：高棉语文件的信息检索：分词的挑战和方法。
6. Spinal Cord Segmentation by One Dimensional Normalized Template Matching: A Novel Quantitative Technique to Analyze Advanced Magnetic Resonance Imaging Data [O] . Adam Cadotte, David W. Cadotte, Micha Livne, -1

机译：一维归一化模板匹配的脊髓分割：一种新颖的定量技术用于分析高级磁共振成像数据
7. Khmer Word Segmentation and Out-of-Vocabulary Words Detection Using Collocation Measurement of Repeated Characters Subsequences [O] . Van Channa, Kameyama Wataru 2013

机译：高棉单词分割和词汇外单词检测使用重复字符子序列的搭配测量

Longest Matching and Rule-based Techniques for Khmer Word Segmentation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅