首页> 外文会议>9th International conference on language resources and evaluation >Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers
【24h】

Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

机译:具有浅语言特性的数据挖掘以研究科学注册簿的多样性

获取原文

摘要

We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformau'cs, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (scitex) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.
机译:我们提供了一种利用数据挖掘技术来分析科学登记的语言演变的方法,比较了浅层和语言特征所获得的见解。重点是计算机科学(计算语言学,生物信息学,数字构造,微电子学)边界上的选定科学学科。数据基础是英语科学文本语料库(scitex),涵盖大约三十年的时间范围(1970/80年代至2000年代初)(Degaetano-Ortlieb等人,2013; Teich和Fankhauser,2010)。特别是,随着时间的推移,我们研究了科学注册的多样化。我们的理论基础是系统功能语言学(SFL)及其特定的化身理论化身(Halliday and Hasan,1985)。在方法方面,我们结合了基于语料库的特征提取方法和数据挖掘技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号