首页> 外文会议>Conference on empirical methods in natural language processing;Workshop on computational approaches to code switching >Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet
【24h】

Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

机译:使用条件随机字段和Babelnet的代码转换文本中的语言识别

获取原文

摘要

The paper outlines a supervised approach to language identification in code-switched data, framing this as a sequence labeling task where the label of each token is identified using a classifier based on Conditional Random Fields and trained on a range of different features, extracted both from the training data and by using information from Babelnet and Babelfy. The method was tested on the development dataset provided by organizers of the shared task on language identification in code-switched data, obtaining tweet level monolingual, code-switched and weighted F1-scores of 94%, 85% and 91%, respectively, with a token level accuracy of 95.8%. When evaluated on the unseen test data, the system achieved 90%, 85% and 87.4% monolingual, code-switched and weighted tweet level F1-scores, and a token level accuracy of 95.7%.
机译:本文概述了一种在代码交换数据中进行语言识别的监督方法,将其框架化为序列标记任务,其中使用基于条件随机字段的分类器识别每个令牌的标签,并在一系列不同特征上进行训练,从训练数据,并使用Babelnet和Babelfy的信息。该方法在代码交换数据中由语言识别共同任务的组织者提供的开发数据集上进行了测试,分别获得94%,85%和91%的推文级别单语,代码交换和加权F1得分,令牌级别的准确性为95.8%。在看不见的测试数据上进行评估时,该系统获得了90%,85%和87.4%的单语,代码转换和加权推文级别F1得分,并且令牌级别的准确性为95.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号