首页> 外文会议>International Research and Innovation Summit >Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document
【24h】

Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document

机译:基于规则的标准马来文献中马来复合名词提取的方法

获取原文

摘要

Malay compound noun is defined as a form of words that exists when two or more words are combined into a single syntax and it gives a specific meaning. Compound noun acts as one unit and it is spelled separately unless an established compound noun is written closely from two words. The basic characteristics of compound noun can be seen in the Malay sentences which are the frequency of that word in the text itself. Thus, this extraction of compound nouns is significant for the following research which is text summarization, grammar checker, sentiments analysis, machine translation and word categorization. There are many research efforts that have been proposed in extracting Malay compound noun using linguistic approaches. Most of the existing methods were done on the extraction of bi-gram noun+noun compound. However, the result still produces some problems as to give a better result. This paper explores a linguistic method for extracting compound Noun from stand Malay corpus. A standard dataset are used to provide a common platform for evaluating research on the recognition of compound Nouns in Malay sentences. Therefore, an improvement for the effectiveness of the compound noun extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach in order to enhance the extraction of compound nouns processing. Several pre-processing steps are involved including normalization, tokenization and tagging. The first step that uses the linguistic approach in this study is Part-of-Speech (POS) tagging. Finally, we describe several rules-based and modify the rules to get the most relevant relation between the first word and the second word in order to assist us in solving of the problems. The effectiveness of the relations used in our study can be measured using recall, precision and Fl-score techniques. The comparison of the baseline values is very essential because it can provide whether there has been an improvement in the result.
机译:马来复合名词被定义为当两个或多个单词组合成单个语法时存在的单词形式,并且它给出了特定的含义。复合名词充当一个单元,除非建立的复合名词被从两个单词密切刻录,否则分别拼写。化合物名词的基本特征可以在马来句中看到,这些句子是文本本身中该词的频率。因此,这种复合名词的提取对于以下研究是显着的,即文本摘要,语法检查,情绪分析,机器翻译和字分类。有许多研究努力,提出了使用语言方法提取马来复合名词。大多数现有方法是对毕克的萃取的+名词化合物进行的。但是,结果仍然产生一些问题,以提供更好的结果。本文探讨了从展台马来语料库中提取复合名词的语言方法。标准数据集用于提供用于评估马来句子中复合名词的研究的共同平台。因此,需要改善化合物名词提取的有效性,因为结果可能会受到损害。因此,该研究提出了语言方法的修饰,以增强复合名词加工的提取。涉及几个预处理步骤,包括标准化,标记和标记。在本研究中使用语言方法的第一步是演讲(POS)标记。最后,我们描述了几个规则的基础规则,并修改了规则,以获得第一个单词和第二个词之间最相关的关系,以帮助我们解决问题。我们研究中使用的关系的有效性可以使用召回,精密和流动技术来测量。基线值的比较是非常重要的,因为它可以提供结果是否有改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号