首页> 外文会议>2016 IEEE International Conference on Big Data Analysis >What is your Mother Tongue?: Improving Chinese native language identification by cleaning noisy data and adopting BM25
【24h】

What is your Mother Tongue?: Improving Chinese native language identification by cleaning noisy data and adopting BM25

机译:您的母语是什么?:通过清除嘈杂的数据并采用BM25来改善中文母语的识别

获取原文
获取原文并翻译 | 示例

摘要

Native language identification (NLI) is a process by which an author's native language can be identified from essays written in the second language of the author. In this work, a supervised model is built to accomplish this based on a Chinese learner corpus. In the NLI field, this is the first work to (1) eliminate noisy data automatically before the training phase and (2) employ a BM25 term weighting technique to score each feature. We also adopt a hierarchical structure of linear support vector machine classifiers to achieve high accuracy and a state-of-the-art accuracy of 77.1%, which is greater than those of other Chinese NLI methods by over 10%.
机译:母语识别(NLI)是一个过程,通过该过程可以从用第二种语言撰写的论文中识别出作者的母语。在这项工作中,基于中国学习者语料库构建了一个监督模型来完成此任务。在NLI领域,这是第一项工作(1)在训练阶段之前自动消除噪声数据,(2)采用BM25项加权技术对每个特征进行评分。我们还采用了线性支持向量机分类器的分层结构,以实现高精度和77.1%的最新精度,这比其他中国NLI方法要高出10%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号