Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

机译：具有浅语言特性的数据挖掘以研究科学注册簿的多样性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformau'cs, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (scitex) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.

机译：我们提供了一种利用数据挖掘技术来分析科学登记的语言演变的方法，比较了浅层和语言特征所获得的见解。重点是计算机科学（计算语言学，生物信息学，数字构造，微电子学）边界上的选定科学学科。数据基础是英语科学文本语料库（scitex），涵盖大约三十年的时间范围（1970/80年代至2000年代初）（Degaetano-Ortlieb等人，2013; Teich和Fankhauser，2010）。特别是，随着时间的推移，我们研究了科学注册的多样化。我们的理论基础是系统功能语言学（SFL）及其特定的化身理论化身（Halliday and Hasan，1985）。在方法方面，我们结合了基于语料库的特征提取方法和数据挖掘技术。

著录项

来源
《9th International conference on language resources and evaluation》|2014年|1602-1609|共8页
会议地点
作者
Stefania Degaetano-Ortlieb; Peter Fankhauser; Hannah Kermes; Ekaterina Lapshinova-Koltunski; Noam Ordan; Elke Teich;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data mining; text classification; register;

机译：数据挖掘;文字分类寄存器;

相似文献

外文文献
中文文献
专利

1. The Linguistic Construal of Disciplinarity: A Data-Mining Approach Using Register Features [J] . Elke Teich, Stefania Degaetano-Ortlieb, Peter Fankhauser, Journal of the American Society for Information Science and Technology . 2016,第7期

机译：语言学的学科：使用寄存器特征的数据挖掘方法
2. Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations [J] . Julia Lavid, Jorge Arús, Bernard DeClerck, Procedia - Social and Behavioral Sciences . 2015,第1期

机译：创建用于语言和计算研究的高质量，注册多样化的并行（英语-西班牙语）语料库
3. Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations [J] . Julia Lavid, Jorge Arús, Bernard DeClerck, Procedia - Social and Behavioral Sciences . 2015,第1期

机译：创建用于语言和计算研究的高质量，注册多样化的并行（英语-西班牙语）语料库
4. Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers [C] . Stefania Degaetano-Ortlieb, Peter Fankhauser, Hannah Kermes, 9th International conference on language resources and evaluation . 2014

机译：具有浅与语言特征的数据挖掘研究科学寄存器的多样化
5. Realizing a feature-based framework for scientific data mining. [D] . Mehta, Sameep. 2006

机译：实现基于特征的科学数据挖掘框架。
6. Detection of drug–drug interactions through data mining studies using clinical sources scientific literature and social media [O] . Santiago Vilar, Carol Friedman, George Hripcsak -1

机译：通过使用临床来源科学文献和社交媒体的数据挖掘研究来检测药物相互作用
7. Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers [O] . Degaetano-Ortlieb Stefania, Fankhauser Peter, Kermes Hannah, 2014

机译：具有浅语言特性的数据挖掘以研究科学注册簿的多样性

Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

摘要

著录项

相似文献

相关主题

期刊订阅