GenitivDB - a Corpus-Generated Database for German Genitive Classification

机译：GenitivDB-德国语元分类的语料库生成数据库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is a comprehensive database of German genitive formations, enriched with a broad range of intra- und extralinguistic metadata. It can be used for the notoriously controversial classification and prediction of genitive endings (short endings, long endings, zero-marker). We also evaluate the main factors influencing the use of specific endings. To get a general idea about a factor's influences and its side effects, we calculate chi-square-tests and visualize the residuals with an association plot. The results are evaluated against a gold standard by implementing tree-based machine learning algorithms. For the statistical analysis, we applied the supervised LMT Logistic Model Trees algorithm, using the WEKA software. We intend to use this gold standard to evaluate GenitivDB, as well as to explore methodologies for a predictive genitive model.

机译：我们提出了一种新颖的NLP资源来解释语言现象，并通过探索非常大型的带注释语言语料库来进行构建和评估。对于汇编，我们使用具有超过50亿个单词形式的德语参考语料库（DeReKo），这是全世界用于研究当代书面德语的最大语言资源。其结果是建立了一个完整的德国同系语数据库，并丰富了多种语言内和语外元数据。它可用于广受争议的分类和预测成语结尾（短结尾，长结尾，零标记）。我们还评估了影响使用特定结尾的主要因素。为了对一个因素的影响及其副作用有一个大致的了解，我们计算卡方检验，并通过关联图可视化残差。通过实施基于树的机器学习算法，根据黄金标准对结果进行了评估。对于统计分析，我们使用WEKA软件应用了监督的LMT Logistic模型树算法。我们打算使用这一黄金标准来评估GenitivDB，并探索预测性遗传模型的方法。

著录项

来源
《9th International conference on language resources and evaluation》|2014年|1849-1855|共7页
会议地点
作者
Roman Schneider;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
NLP; Metadata; Grammar;

机译：NLP;元数据语法;

相似文献

外文文献
中文文献
专利

1. Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology? [J] . Ali Zulfiqar, Alsulaiman Mansour, Muhammad Ghulam, Journal of voice: official journal of the Voice Foundation . 2017,第3期

机译：阿拉伯语，英语和德语数据库的内部和数据库内的研究：进行传统语音功能检测语音病理学吗？
2. Children and adolescents with type 1 diabetes in Germany are more overweight than healthy controls: results comparing DPV database and CrescNet database [J] . Thomas M. Kapellen, Ruth Gausche, Axel Dost, Journal of pediatric endocrinology & metabolism: JPEM . 2014,第3a4期

机译：在德国，患有1型糖尿病的儿童和青少年比健康对照组的体重超重：比较DPV数据库和CrescNet数据库的结果
3. The adsorption database - Database and software package for the adsorption equilibria of gases and vapours [German] [J] . Sakuth M., Sander S., Meyer J., Chemie-Ingenieur-Technik: Verfahrenstechnik Technische Chemie Apparatewesen Biotechnologie . 1998,第10期

机译：吸附数据库-气体和蒸汽吸附平衡的数据库和软件包[德语]
4. GenitivDB - a Corpus-Generated Database for German Genitive Classification [C] . Roman Schneider 9th International conference on language resources and evaluation . 2014

机译：GenitivdB - 用于德国动物分类的语料库生成的数据库
5. Images of Germany: A Theory-Based Approach to the Classification, Analysis, and Critique of British Attitudes Towards Germany: 1890-1940, Volume One [D] . MacIntyre, Duncan. 1990

机译：德国的图像：一种基于理论的方法，对德国的英国态度的分类，分析和批判性：1890-1940，1
6. Mortality in the German Pharmacoepidemiological Research Database (GePaRD) compared to national data in Germany: results from a validation study [O] . Christoph Ohlmeier, Ingo Langner, Kathrin Hillebrand, 2015

机译：德国药物流行病学研究数据库（GePaRD）中的死亡率与德国的国家数据进行了比较：一项验证研究的结果
7. GenitivDB - a corpus-generated database for German genitive classification [O] . Schneider Roman 2014

机译：GenitivDB-由语料库生成的用于德国遗传分类的数据库

GenitivDB - a Corpus-Generated Database for German Genitive Classification

摘要

著录项

相似文献

相关主题

期刊订阅