A Lazy Man's Way to Part-of-Speech Tagging

机译：懒惰的男人参加演讲标签的方式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A statistical-based approach to word alignment involving automatically projecting part-of-speech (POS) tags is presented. The approach is referred to as the "lazy man's way" because it improves POS assignment for a resource-poor language by exploiting its similarity to a resource-rich one. This unsupervised learning method combines the N-gram and Dice Coefficient similarity functions in order to align English texts with Malay texts thus projecting the POS tags from English to Malay. It is a quick method that does not require the laborious effort needed to annotate the Malay dataset. A case study, an experiment done on 25 terrorism news articles written in Malay, has shown that leveraging pre-existing resources from a resource-rich language, i.e. English, to supplement a resource-poor language, i.e. Malay, is feasible and avoids building new text-processing tools from scratch. The system was tested on the Malay corpus, consisting of 5413 word tokens. The results reached values of 86.87% for precision, 72.56% for recall and 79.07% for F1-Score. This shows that the "lazy man's way", where a resource-poor language just exploits the rich linguistic information available in English, increases bitext projection accuracy significantly.

机译：提出了一种基于词对齐的基于统计的方法，涉及自动投影语音部分（POS）标签。这种方法被称为“懒惰人的方式”，因为它通过利用其与资源丰富的语言来改善资源差的语言的POS分配。这种无监督的学习方法结合了n-gram和骰子系数相似度函数，以便将英语文本与马来文文本调整，从而将POS标记从英语从英语投影到马来语。这是一种快速的方法，不需要注释马来数据集所需的费力。一个案例研究，在马来书面写入的25个恐怖主义新闻文章的实验表明，利用资源丰富的语言，即英语，以补充资源差的语言，即马来语，是可行的，是可行的，是可行的，是可行的，是可行的，是可行的，是可行的，是可行的，是可行的，是可行的，是可行的，避免建设从头开始的新文本处理工具。该系统在马来的语料库上测试，由5413个字标记组成。结果达到86.87％的精度，召回72.56％，F1分数为79.07％。这表明，“懒惰人的方式”，资源匮乏的语言只是利用英语提供的丰富语言信息，显着提高了BITEXT投影精度。

著录项

来源
《International Workshop on Knowledge Management and Acquisition for Intelligent Systems》|2012年||共12页
会议地点
作者
Norshuhani Zamin; Alan Oxley; Zainab Abu Bakar; Syed Ahmad Farhan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Tagging Accuracy Analysis on Part-of-Speech Taggers [J] . Semih Yumusak, Erdogan Dogdu, Halife Kodaz Journal of Computer and Communications . 2014,第4期

机译：词性标注器的标注准确性分析
2. Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage [J] . Mohamed Emad ACM transactions on Asian language information processing . 2018,第3期

机译：阿拉伯文的形态学分割和词性标注
3. FarsiTag: A part-of-speech tagging system for Persian [J] . Rezai Mohammad Javad, Miangah Tayebeh Mosavi Literary & linguistic computing . 2017,第3期

机译：FarsiTag：波斯语的词性标记系统
4. A Lazy Man's Way to Part-of-Speech Tagging [C] . Norshuhani Zamin, Alan Oxley, Zainab Abu Bakar, Pacific Rim knowledge acquisition workshop . 2012

机译：懒惰的词性标记方式
5. IITagger: Tagging Wall Street Journal text with part-of-speech information [D] . Kim, Yeongkwun 1996

机译：IITagger：使用词性信息标记“华尔街日报”文本
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging? [O] . Wilks, Yorick, Stevenson, Mark 1996

机译：感知语法：字义标记远远超过词性标注？

A Lazy Man's Way to Part-of-Speech Tagging

摘要

著录项

相似文献

相关主题

期刊订阅