首页> 外文会议>Workshop on NLP for similar languages, varieties and dialects >Vanilla Classifiers for Distinguishing between Similar Languages
【24h】

Vanilla Classifiers for Distinguishing between Similar Languages

机译:区分相似语言的香草分类器

获取原文

摘要

In this paper we describe the submission of the UniBuc-NLP team for the Discriminating between Similar Languages Shared Task, DSL 2016. We present and analyze the results we obtained in the closed track of sub-task 1 (Similar languages and language varieties) and sub-task 2 (Arabic dialects). For sub-task 1 we used a logistic regression classifier with tf-idf feature weighting and for sub-task 2 a character-based string kernel with an SVM classifier. Our results show that good accuracy scores can be obtained with limited feature and model engineering. While certain limitations are to be acknowledged, our approach worked surprisingly well for out-of-domain, social media data, with 0.898 accuracy (3~(rd) place) for dataset B1 and 0.838 accuracy (4~(th) place) for dataset B2.
机译:在本文中,我们描述了UniBuc-NLP小组提交的关于区分相似语言共享任务DSL 2016的意见。我们介绍并分析了在子任务1(相似语言和语言变体)和子任务2(阿拉伯语)。对于子任务1,我们使用具有tf-idf特征权重的逻辑回归分类器,对于子任务2,我们使用具有SVM分类器的基于字符的字符串内核。我们的结果表明,只有有限的功能和模型工程才能获得良好的准确性得分。尽管需要承认某些局限性,但我们的方法对于域外社交媒体数据的效果出奇地好,数据集B1的精度为0.898(第3名),数据集的精度为0.838(第4名)。数据集B2。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号