Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics

机译：为什么中国的Web-Corpus很古怪？或：大数据如何杀死中国语料库语言学

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper aims to examine and evaluate the current development of using Web-vas-Corpus (WaC) paradigm in Chinese corpus linguistics:. I will argue that the unstable notion of wordhood in Chinese and the resulting diverse ideas of implementing word segmentation systems have posed great challenges for those who are keen on building web-scaled corpus data. Two lexical measures are proposed to illustrate the issues and methodological discussions are provided.

机译：本文旨在检查和评估在中文语料库语言学中使用Web-vas-Corpus（WaC）范式的最新发展：我将争辩说，中文的词性概念不稳定，以及由此产生的实施分词系统的多样化思想，对那些热衷于构建网络级语料库数据的人们构成了巨大挑战。提出了两种词汇量度来说明问题，并提供了方法论的讨论。

著录项

来源
《9th International conference on language resources and evaluation》|2014年|4035-4038|共4页
会议地点
作者
Shu-Kai Hsieh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Corpus evaluation; word segmentation. Web as Corpus;

机译：语料库评估;分词。网络作为语料库;

相似文献

外文文献
中文文献
专利

1. Four-Word Bundles in English Abstracts of Chinese and English Linguistics Journal Articles:A Corpus-based Comparative Study [J] . ZHOU Yi-tao 文学与艺术研究：英文版 . 2021,第002期

机译：中英文语言学期刊文章中英文摘要的四字捆绑：基于语料库的比较研究
2. Corpus Linguistics in Chinese Contexts [J] . Jiajin Xu International Journal of Computer-Assisted Language Learning and Teaching . 2016,第2期

机译：中国语境下的语料库语言学
3. CSR Image Construction of Chinese Construction Enterprises in Africa Based on Data Mining and Corpus Analysis [J] . Yaoping Zhong, Wenzhong Zhu, Yingying Zhou Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：基于数据挖掘和语料库分析的非洲中国建筑企业CSR图像建设
4. Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics [C] . Shu-Kai Hsieh 9th International conference on language resources and evaluation . 2014

机译：为什么中国的网上语料库是古怪的？或者：大数据如何杀死中国语料库语言学
5. The diachronic development and synchronic diversity of the disyllabic-directional complements in Chinese: A corpus-based cross-linguistic study on the grammaticalization pathways and semantic change of the disyllabic-directional complements. [D] . Chen, Zhen. 2006

机译：汉语双音节补语的历时发展和共时多样性：基于语料库的双音节补语的语法化途径和语义变化研究。
6. Exploring the Psychological Effects of COVID-19 Home Confinement in China: A Psycho-Linguistic Analysis on Weibo Data Pool [O] . Peijing Wu, Nan Zhao, Sijia Li, 2021

机译：探索Covid-19家庭监禁在中国的心理效应：微博数据池的心理语言学分析
7. Using Chinese Gigaword Corpus and Chinese Word Sketch in linguistic Research [O] . Hong Jia-Fei, Huang Chu-Ren 2006

机译：在语言研究中使用中华大辞典语料库和汉字速写

Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics

摘要

著录项

相似文献

相关主题

期刊订阅