首页> 外文期刊>ACM transactions on Asian language information processing >The Effectiveness of a Jawi Stemmer for Retrieving Relevant Malay Documents in Jawi Characters
【24h】

The Effectiveness of a Jawi Stemmer for Retrieving Relevant Malay Documents in Jawi Characters

机译:Jawi词干提取器在Jawi字符中检索相关马来语文档的有效性

获取原文
获取原文并翻译 | 示例
       

摘要

The Malay language has two types of writing script, known as Rumi and Jawi. Most previous stemmer results have reported on Malay Rumi characters and only a few have tested Jawi characters. In this article, a new Jawi stemmer has been proposed and tested for document retrieval. A total of 36 queries and datasets from the transliterated Jawi Quran were used. The experiment shows that the mean average precision for a "stemmed Jawi" document is 8.43%. At the same time, the mean average precision for a "nonstemmed Jawi" document is 5.14%. The result from a paired sample t-test showed that the use of a "stemmed Jawi" document increased the precision in document retrieval. Further experiments were performed to examine the precision of the relevant documents that were retrieved at various cutoff points for all 36 queries. The results for the "stemmed Jawi" document showed a significantly different start, at a cutoff of 40, compared with the "nonstemmed Jawi" documents. This result shows the usefulness of a Jawi stemmer for retrieving relevant documents in the Jawi script.
机译:马来语有两种书写脚本,称为Rumi和Jawi。以前的大多数词干分析结果都报告了马来语Rumi字符,只有少数人测试了Jawi字符。在本文中,已经提出了一种新的Jawi词干提取器,并对其进行了测试以用于文档检索。转译的Jawi Quran共使用了36个查询和数据集。实验表明,“梗死的Jawi”文档的平均平均精度为8.43%。同时,“无梗Jawi”文档的平均平均精度为5.14%。配对样本t检验的结果表明,使用“茎状Jawi”文档可以提高文档检索的准确性。进行了进一步的实验,以检查在所有36个查询的不同截止点处检索到的相关文档的准确性。与“非阻塞Jawi”文件相比,“阻塞Jawi”文件的结果显示起点截然不同,截止值为40。此结果表明Jawi词干提取器在Jawi脚本中检索相关文档的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号