首页> 外文会议>IEEE Conference on Open Systems >Automatic detection of compound word in Malay standard document using rule based technique
【24h】

Automatic detection of compound word in Malay standard document using rule based technique

机译:基于规则的技术自动检测马来标准文献中的复合词

获取原文

摘要

In this work, we show our rule based technique to detect automatically the bi-gram compound word from the Malay standard document. Our scope for a compound word that has been detected in this work is a bi-gram compound for Noun Noun, Noun Adjective and Noun Verb combination. We identified some limitations on detection of Malay compound word with the existing methods that correspond to a structure of Malay sentences. Before the process of detection compound word was done, preprocessing task was applied to produce the list of compound word candidate. During the process of detecting compound word, we used dictionary-based and thesaurus information for applying Part of Speech (POS) tagging to tag for all the words in the selected Malay document. Then, after the tagging process, we modified several existing identification rule-based according to Malay grammar rules and the pattern of the sentences to increase the percentage of recall, precision and F1-Score. All the evaluation values were compared with the previous work. Testing was done on 3124 sentences taken from Utusan Melayu news. The result in average showed an improvement compared to previous research with precision of 93.8%, a recall of 31.1% and a F1-Score of 43.8%.
机译:在这项工作中,我们展示了基于规则的技术来从马来标准文件中自动检测双克复合词。我们在这项工作中检测到​​的复合词的范围是用于名词,名词形容词和名词动词组合的双克化合物。我们确定了对对应于马来句结构的现有方法检测马来复合词的一些局限性。在检测复合词的过程之前,应用预处理任务以产生复合词候选人的列表。在检测复合字的过程中,我们使用基于字典的字典和词库信息,以将部分语音(POS)标记应用于标记为选定的马来语文档中的所有单词。然后,在标记过程之后,根据马来语语法规则和句子的模式修改了几个现有的识别规则,以增加召回,精度和F1分数的百分比。将所有评估值与以前的工作进行比较。从Utusan Melayu新闻采取的3124个句子进行了测试。与之前的研究相比,平均结果平均显示了93.8%的精确性,召回了31.1%,F1分数为43.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号