首页> 外文会议>9th International conference on language resources and evaluation >A Database for Measuring Linguistic Information Content
【24h】

A Database for Measuring Linguistic Information Content

机译:用于测量语言信息内容的数据库

获取原文

摘要

Which languages convey the most information in a given amount of space? This is a question often asked of linguists, especially by engineers who often have some information theoretic measure of "information" in mind, but rarely define exactly how they would measure that information. The question is, in fact remarkably hard to answer, and many linguists consider it unanswerable. But it is a question that seems as if it ought to have an answer. If one had a database of close translations between a set of typologically diverse languages, with detailed marking of morphosyntactic and morphosemantic features, one could hope to quantify the differences between how these different languages convey information. Since no appropriate database exists we decided to construct one. The purpose of this paper is to present our work on the database, along with some preliminary results. We plan to release the dataset once complete.
机译:在给定的空间中,哪种语言传达的信息最多?这是语言学家经常提出的一个问题,尤其是工程师,他们经常想到一些信息理论上对“信息”的度量,但是很少确切地定义他们将如何度量这些信息。实际上,这个问题很难回答,许多语言学家认为这是无法回答的。但这似乎是一个问题,应该有一个答案。如果一个人拥有一组类型多样的语言之间的紧密翻译数据库,并详细标记了句法和形态特征,那么人们可能希望量化这些不同语言传达信息的方式之间的差异。由于不存在合适的数据库,因此我们决定构建一个数据库。本文的目的是介绍我们在数据库上的工作以及一些初步结果。我们计划在完成后发布数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号