Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

机译：使用条件随机字段和Babelnet的代码转换文本中的语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The paper outlines a supervised approach to language identification in code-switched data, framing this as a sequence labeling task where the label of each token is identified using a classifier based on Conditional Random Fields and trained on a range of different features, extracted both from the training data and by using information from Babelnet and Babelfy. The method was tested on the development dataset provided by organizers of the shared task on language identification in code-switched data, obtaining tweet level monolingual, code-switched and weighted F1-scores of 94%, 85% and 91%, respectively, with a token level accuracy of 95.8%. When evaluated on the unseen test data, the system achieved 90%, 85% and 87.4% monolingual, code-switched and weighted tweet level F1-scores, and a token level accuracy of 95.7%.

机译：本文概述了一种在代码交换数据中进行语言识别的监督方法，将其框架化为序列标记任务，其中使用基于条件随机字段的分类器识别每个令牌的标签，并在一系列不同特征上进行训练，从训练数据，并使用Babelnet和Babelfy的信息。该方法在代码交换数据中由语言识别共同任务的组织者提供的开发数据集上进行了测试，分别获得94％，85％和91％的推文级别单语，代码交换和加权F1得分，令牌级别的准确性为95.8％。在看不见的测试数据上进行评估时，该系统获得了90％，85％和87.4％的单语，代码转换和加权推文级别F1得分，并且令牌级别的准确性为95.7％。

著录项

来源
《Conference on empirical methods in natural language processing;Workshop on computational approaches to code switching 》|2016年|127-131|共5页
会议地点
作者
Utpal Kumar Sikdar; Bjoern Gambaeck;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts [J] . Lee Wangjin, Kim Kyungmo, Lee Eun Young, Computers in Biology and Medicine . 2018 ,第期

机译：临床命名实体识别的条件随机字段：韩国临床文本的比较研究
2. Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields [J] . Seok Jae-Hyun, Kim Jin Hyung Pattern Recognition: The Journal of the Pattern Recognition Society . 2015 ,第11期

机译：使用霍夫森林隐式形状模型和半马尔可夫条件随机场进行场景文本识别
3. Contextual texton-text stroke classification in online handwritten notes with conditional random fields [J] . Adrien Delaye, Cheng-Lin Liu Pattern Recognition: The Journal of the Pattern Recognition Society . 2014 ,第3期

机译：具有条件随机字段的在线手写笔记中的上下文文本/非文本笔划分类
4. Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet [C] . Utpal Kumar Sikdar, Bjoern Gambaeck Conference on empirical methods in natural language processing . 2016

机译：使用条件随机字段和babelnet的代码切换文本中的语言识别
5. SELECTED TOPICS IN SPATIAL STATISTICAL ANALYSIS: NONSTATIONARY VECTOR KRIGING, LARGE SCALE CONDITIONAL SIMULATION OF THREE-DIMENSIONAL GAUSSIAN RANDOM FIELDS, AND HYPOTHESIS TESTING IN A CORRELATED RANDOM FIELD [D] . QUIMBY, WILLIAM F. 1986

机译：空间统计分析中的选定主题：非平稳向量Kriging，三维高斯随机场的大规模条件模拟以及相关随机场中的假设检验
6. De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields [O] . Hercules Dalianis, Sumithra Velupillai 2010

机译：取消识别瑞典临床文本-完善金标准和条件随机场实验
7. Named Entity Recognition on Code-Switched Data Using Conditional Random Fields [O] . Utpal Kumar Sikdar, Biswanath Barik, Björn Gambäck 2018

机译：使用条件随机字段命名在代码切换数据上的实体识别

Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet

摘要

著录项

相似文献

相关主题

期刊订阅