首页> 外文会议>International Conference on Web Engineering >Exploring Semantic Change of Chinese Word Using Crawled Web Data
【24h】

Exploring Semantic Change of Chinese Word Using Crawled Web Data

机译:利用爬网数据探究汉语单词的语义变化

获取原文

摘要

Words changing their meanings over time reflects various shifts in socio-cultural attitudes and conceptual structures. Understanding semantic change of words over time is important in order to study models of language and cultural evolution. Word embeddings methods such as PPMI, SVD and word2vec have been evaluated in recent years. These kinds of representation methods, sometimes referring as semantic maps of words, are able to facilitate the whole process of language processing. Chinese language is no exception. The development of technology gradually influences people's communication and the language they are using. In the paper, a huge amount of data (300 GB) is provided by Sogou, a Chinese web search engine provider. After pre-processing, the Chinese language corpus is obtained. Three different word representation methods are extended to including temporal information. They are trained and tested based on the above dataset. A thorough analysis (both qualitative and quantitative analysis) is conducted with different thresholds to capture different semantic accuracy and alignment quality of the shifted words. A comparison between three methods is provided and possible reasons behind experiment results are discussed.
机译:单词随着时间的变化而变化的含义反映了社会文化态度和概念结构的各种变化。为了研究语言和文化演变的模型,了解单词随时间的语义变化很重要。近年来,已经对诸如PPMI,SVD和word2vec之类的词嵌入方法进行了评估。这些表示方法(有时称为单词的语义图)能够促进语言处理的整个过程。中文也不例外。技术的发展逐渐影响人们的交流和使用的语言。本文中,中国网络搜索引擎提供商搜狗提供了大量数据(300 GB)。经过预处理,获得了中文语料库。三种不同的单词表示方法被扩展为包括时间信息。根据上述数据集对它们进行了培训和测试。使用不同的阈值进行彻底的分析(定性和定量分析),以捕获移位词的不同语义准确性和对齐质量。比较了三种方法,并讨论了实验结果背后的可能原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号