首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >SMTPOST: Using Statistical Machine Translation Approach in Filipino Part-of-Speech Tagging
【24h】

SMTPOST: Using Statistical Machine Translation Approach in Filipino Part-of-Speech Tagging

机译:SMTPOST:在菲律宾语词性标记中使用统计机器翻译方法

获取原文

摘要

The field of Natural Language Processing (NLP) in the country has been continually developing. However, the transition between Tagalog to the progressing Filipino language left tools and resources behind. This paper introduces a Statistical Machine Translation Part-of-Speech (POS) Tagger for Filipino (SMTPOST), with the purpose of reviving, updating and widening the scope of technologies in the POS' tagging domain, catering to the changes made by the Filipino language. Resources built are comprised mainly of a tagset (218 tags), parallel corpus (2,668 sentences), affix rules (59 rules) and word-tag dictionary (309 entries). SMTPOST was tested to different tagsets and domains, producing 84.75% as its highest accuracy score, at least 3.75% increase from the available Tagalog POS taggers. Despite SMTPOST's utilization of Filipino resources and good performance, there are room for improvements and opportunities. Recommendations include a better feature extractor (preferably a morphological analyzer), an increase in scope for all of the resources, implementation of pre- and/or postprocessing, and the utilization of SMTPOST research to other NLP applications.
机译:该国的自然语言处理(NLP)领域一直在不断发展。但是,从塔加洛语到发展中的菲律宾语言之间的过渡却留下了工具和资源。本文介绍了用于菲律宾语的统计机器翻译词性(POS)Tagger(SMTPOST),目的是恢复,更新和扩展POS标记领域中的技术范围,以适应菲律宾人所做的更改语言。构建的资源主要包括标签集(218个标签),并行语料库(2,668个句子),词缀规则(59个规则)和单词标签字典(309个条目)。 SMTPOST已针对不同的标记集和域进行了测试,其最高准确性得分为84.75%,比可用的Tagalog POS标记者至少提高了3.75%。尽管SMTPOST充分利用了菲律宾的资源和良好的性能,但仍有改进和机遇的余地。建议包括更好的特征提取器(最好是形态分析器),扩大所有资源的范围,实施预处理和/或后处理以及将SMTPOST研究用于其他NLP应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号