In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpus from one annotation guideline to another. We then propose two optimization strategies, iterative training and predict-self reestimation, to further improve the accuracy of annotation guideline transformation. Experiments on Chinese word segmentation show that, the iterative training strategy together with predict-self reestimation brings significant improvement over the simple annotation transformation baseline, and leads to classifiers with significantly higher accuracy and several times faster processing than annotation adaptation does. On the Penn Chinese Treebank 5.0, it achieves an F-measure of 98.43%, significantly outperforms previous works although using a single classifier with only local features.
展开▼
机译:在本文中,我们首先描述了自动注释转换技术,基于注释适应算法(江等,2009)。它可以自动将人类注释的语料库从一个注释指南转换为另一个注释指南。然后,我们提出了两种优化策略,迭代培训和预测 - 自我保证,以进一步提高注释指南转型的准确性。汉字分割实验表明,与预测自我评估的迭代培训策略与简单的注释转换基线带来了显着的改进,并导致分类器具有明显更高的准确性和比注释适应更快的处理更快的处理。在Penn Chinese TreeBank 5.0上,它达到了98.43%的F-Measure,显着优于以前的作品,尽管使用单个分类器,仅具有本地特征。
展开▼