首页> 外文会议>9th International conference on language resources and evaluation >Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics
【24h】

Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics

机译:为什么中国的Web-Corpus很古怪?或:大数据如何杀死中国语料库语言学

获取原文

摘要

This paper aims to examine and evaluate the current development of using Web-vas-Corpus (WaC) paradigm in Chinese corpus linguistics:. I will argue that the unstable notion of wordhood in Chinese and the resulting diverse ideas of implementing word segmentation systems have posed great challenges for those who are keen on building web-scaled corpus data. Two lexical measures are proposed to illustrate the issues and methodological discussions are provided.
机译:本文旨在检查和评估在中文语料库语言学中使用Web-vas-Corpus(WaC)范式的最新发展:我将争辩说,中文的词性概念不稳定,以及由此产生的实施分词系统的多样化思想,对那些热衷于构建网络级语料库数据的人们构成了巨大挑战。提出了两种词汇量度来说明问题,并提供了方法论的讨论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号