首页> 外文期刊>Pacific Journal of Science and Technology >A Stochastic Collocation Algorithm Method for Processing the Yoruba Language Using the Data Context Approach Based on Text, Lexicon, and Grammar
【24h】

A Stochastic Collocation Algorithm Method for Processing the Yoruba Language Using the Data Context Approach Based on Text, Lexicon, and Grammar

机译:一种基于文本,词汇和语法的数据上下文方法处理Yoruba语言的随机搭配算法方法

获取原文
       

摘要

In this paper, we show an initial attempt to generate a self-extractive text processor for the Yoruba language. The Yoruba language is a language spoken by about 60 million persons across America, Europe, and majorly West Africa. This is implemented with the use of a holder codenamed "YOTEX". YoTEx is a Yoruba language text repository which simply learns from the English Language corpus with much emphasis on the agglutinative tendencies of the Yoruba language.In the building of the data repository, the development of the system considered parameters for existing relations as available in other textual corpora like the WordNet English corpus, which is used in this work as a case study. We used stochastic collocation algorithm to show relationship within entities. The choice of the algorithm is based on the tonal orientation of the language. Hidden Markov model was extended in line with the aim of carrying out deep text analysis. The developed system performs well against known benchmarks in the formulation of an appropriate tagging, part of speech, stemming, chunking etc. system for the Yoruba textual terms. The resulting YoTex will improve the "codinazation" of the Yoruba Language in particular and the other agglutinative language in general. Such will enhance the computer processing efficacies of the Yoruba language. This work presents a novel approach of testing some known language models on a Yoruba lexical corpus.
机译:在本文中,我们显示初步尝试为Yoruba语言生成自动提取文本处理器。 Yoruba语言是美国,欧洲和大非洲大约6000万人所说的语言。这是通过使用持有者代号为“yotex”来实现的。 YOTEX是一个YORUBA语言文本存储库,简单地从英语语料库中学习,强调了Yoruba语言的凝聚趋势。在数据存储库的建设中,系统的开发考虑了其他文本中可用的现有关系的参数Corpora喜欢Wordnet英语语料库,在这项工作中用于案例研究。我们使用随机搭配算法在实体内显示关系。算法的选择是基于语言的音调方向。隐藏的马尔可夫模型符合深入文本分析的目的。开发系统对制定适当的标记,部分语音,茎,块等系统进行了良好的已知基准测试,用于Yoruba文本术语。由此产生的yotex将特别提高Yoruba语言的“Codinazation”,特别是其他凝聚性语言。这将增强Yoruba语言的计算机处理效率。这项工作提出了一种在Yoruba词汇语料库上测试一些已知语言模型的新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号