首页>
外文OA文献
>The Theoretical Argument for Disproving Asymptotic Upper-Bounds on the Accuracy of Part-of-Speech Tagging Algorithms: Adopting a Linguistics, Rule-Based Approach
【2h】
The Theoretical Argument for Disproving Asymptotic Upper-Bounds on the Accuracy of Part-of-Speech Tagging Algorithms: Adopting a Linguistics, Rule-Based Approach
This paper takes a deep dive into a particular area of the interdisciplinary domain of Computational Linguistics, Part-of-Speech Tagging algorithms. udThe author relies primarily on scholarly Computer Science and Linguistics papers to describe previous approaches to this task and the often-hypothesized existence of the asymptotic accuracy rate of around 98%, by which this task is allegedly bound. However, after doing more research into why the accuracy of previous algorithms have behaved in this asymptotic manner, the author identifies valid and empirically-backed reasons why the accuracy of previous approaches do not necessarily reflect any sort of general asymptotic bound on the task of automated Part-of-Speech Tagging. In response, a theoretical argument is proposed to circumvent the shortcomings of previous approaches to this task, which involves abandoning the flawed status-quo of training machine learning algorithms and predictive models on outdated corpora, and instead walks the reader from conception through implementation of a rule-based algorithm with roots in both practical and theoretical Linguistics. udWhile the resulting algorithm is simply a prototype which cannot be currently verified in achieving a tagging-accuracy rate of over 98%, its multi-tiered methodology, meant to mirror aspects of human cognition in Natural Language Understanding, is meant to serve as a theoretical blueprint for a new and inevitably more-reliable way to deal with the challenges in Part-of-Speech Tagging, and provide much-needed advances in the popular area of Natural Language Processing. ud
展开▼
机译:本文深入研究了计算语言学跨学科领域的特定领域,即词性标注算法。 ud作者主要依赖于计算机科学和语言学方面的学术论文来描述该任务的先前方法以及通常被假设为98%左右的渐近准确率,据此认为该任务受到限制。然而,在对为什么以前的算法的精度表现为渐近方式进行了更多研究之后,作者确定了有效的和经验支持的原因,即为什么以前的方法的精度不一定反映自动化任务的任何形式的一般渐近约束词性标记。作为回应,提出了一种理论上的观点来规避以前完成该任务的方法的缺点,该方法涉及放弃训练机器学习算法和过时语料库的预测模型的有缺陷的现状,而是引导读者从构思到实现基于规则的算法,源于实践和理论语言学。 ud虽然生成的算法只是一个原型,目前尚无法实现超过98%的标签准确率的验证,但其多层方法旨在反映自然语言理解中人类认知的各个方面,旨在将理论蓝图,以一种新的,不可避免的,更可靠的方式来处理词性标记中的挑战,并在自然语言处理的流行领域提供急需的进展。 ud
展开▼