【24h】

Malayalam POS Tagger-A Comparison Using SVM and HMM

机译:Malayalam POS标签 - 使用SVM和HMM进行比较

获取原文

摘要

Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support Vector Machine (SVM), Memory-Based Language Processing (MBLP), Hidden Markov Model (HMM) and other similar techniques. The objective was to find an improved POS tagger for the Malayalam language. This work proposed a comparison of the Malayalam POS tagger using the SVM and Hidden Markov model (HMM). The tagset used was the popular Bureau of Indian Standard (BIS) tag set. A manually created data set which has around 52,000 words has been taken from various Malayalam news sites. The preprocessing steps that have done for news text are also mentioned. Then POS tagging has been done using SVM and HMM. As POS tagging requires the extraction of multiple class labels, a multi-class SVM is used. It also performs feature extraction, feature selection, and classification. The word sense disambiguation and misclassification of words are the two major issues identified in SVM. Hidden Markov Model predicts the hidden sequence based on maximum observation likelihood which reduces ambiguity and misclassification rate.
机译:使用支持向量机(SVM),基于内存语言处理(MBLP),隐藏的Markov模型(HMM)和其他类似技术,已经实施了许多语音(POS)标记器的语音(POS)标记器。目标是为马拉雅拉姆语来找到一种改进的POS标签。这项工作提出了使用SVM和隐马尔可夫模型(HMM)的Malayalam POS标记的比较。使用的标签是印度标准(BIS)标签集的热门局。从各种Malayalam新闻网站中拍摄了一个手动创建的数据集。还提到了为新闻文本完成的预处理步骤。然后使用SVM和HMM完成POS标记。随着POS标记需要提取多个类标签,使用多级SVM。它还执行特征提取,特征选择和分类。词语歧义和错误分类是SVM中识别的两个主要问题。隐藏的马尔可夫模型基于最大观测似然性来预测隐藏的序列,这降低了歧义和错误分类率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号