首页> 外文会议>Workshop on NLP for similar languages, varieties and dialects >Advances in Ngram-based Discrimination of Similar Languages
【24h】

Advances in Ngram-based Discrimination of Similar Languages

机译:基于Ngram的相似语言歧视研究进展

获取原文

摘要

We describe the systems entered by the National Research Council in the 2016 shared task on discriminating similar languages. Like previous years, we relied on character ngram features, and a combination of discriminative and generative statistical classifiers. We mostly investigated the influence of the amount of data on the performance, in the open task, and compared the two-stage approach (predicting language/group, then variant) to a flat approach. Results suggest that ngrams are still state-of-the-art for language and variant identification, that additional data has a small but decisive impact, and that the two-stage approach performs slightly better, everything else being kept equal, than the flat approach.
机译:我们描述了国家研究委员会在2016年共同致力于区分相似语言的任务中输入的系统。像往年一样,我们依靠字符ngram特征以及区分性和生成性统计分类器的组合。在开放任务中,我们主要研究了数据量对性能的影响,并将两阶段方法(预测语言/组,然后是变体)与统一方法进行了比较。结果表明,ngram仍然是语言和变体识别的最新技术,附加数据的影响很小但具有决定性,并且两阶段方法的性能略好于其他方法,与固定方法相比,其他条件保持不变。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号