首页> 外文会议>Language and Technology Conference >Improving Chunker Performance Using a Web-Based Semi-automatic Training Data Analysis Tool
【24h】

Improving Chunker Performance Using a Web-Based Semi-automatic Training Data Analysis Tool

机译:使用基于Web的半自动培训数据分析工具提高块状性能

获取原文

摘要

Fine tuning features for NP chunking is a difficult task. The effects of a modification are sometimes unpredictable. The tuning process with a (un)supervised learning algorithm does not produce necessarily better results. An online toolkit was developed for this scenario highlighting critical areas in training data, which may pose a challenge for the learning algorithm: irregular data, exceptions in trends, unusual property values. This overview of problematic data might inspire the linguist to enhance the data (for example by dividing a class into more detailed classes). The kit was tested on English and Hungarian corpora. Results show that the preparation of datasets for NP chunking is accelerated effectively, which result in better F-scores. The toolkit runs on a simple browser and its usage poses no difficulties for non-technical users. The tool combines the abstraction ability of a linguist and the power of a statistical engine.
机译:NP Chunking的微调功能是一项艰巨的任务。修改的影响有时是不可预测的。具有(UN)监督学习算法的调整过程不会产生更好的结果。为此方案开发了在线工具包,突出显示培训数据中的关键区域,这可能对学习算法构成挑战:不规则数据,趋势的异常,异常属性值。问题数据概述可能会激发语言专家来增强数据(例如,将类除以更详细的类)。该套件在英语和匈牙利的Corpora上进行了测试。结果表明,有效加速了NP块的数据集的准备,从而导致更好的F分数。 Toolkit在简单的浏览器上运行,其使用对非技术用户没有困难。该工具结合了语言学家的抽象能力和统计引擎的力量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号