【24h】

Issues in Encoding the Writing of Nepal's Languages

机译:尼泊尔语言文字编码中的问题

获取原文

摘要

The major language of Nepal, known today as Nepali, is spoken as mother tongue by nearly half the population, and as a second language by nearly all of the rest. A considerable volume of computational linguistics work has been done on Nepali, both in research establishments and commercial organizations. However there are another 94 languages indigenous to the country, and the situation for these is not good. In order to apply computational linguistics methods to a language it must first be represented in the computer, but most of the languages of Nepal have no written tradition, let alone any support by computers. It is the written form that is needed for full computational processes, and it is here that we encounter barriers or at best inappropriate compromises. We will look at the situation in Nepal, ignoring the 17 cross-border languages where the major speaker population lies outside Nepal. We are left with only three languages with written traditions: Nepali which is well served, Newari with over 1000 years of written tradition but which so far has been frustrated in attempts to encode its writing, and Limbu which does have its writing encoded though with defects. Many of the remaining languages may be written in De-vanagari, but aspire to something different that relates to their languages and has a more visually distinctive writing to mark their identity. We look at what can be done for these remaining languages and speculate whether a common writing system and encoding could cover all the languages of Nepal. Inevitably we must focus on the current standard for the computer encoding of writing, Unicode, but we find that while language activists in Nepal do not adequately understand what is possible with the technology and pursue objectives within Unicode that are not necessary or helpful, external experts only have limited understanding of all the issues involved and the requirements of living languages and their users and instead pursuing scholarly interests which offer limited support for living users.
机译:尼泊尔的主要语言,今天被称为尼泊尔语,几乎一半的人口都将其作为母语,而几乎所有其他人口都将其作为第二语言。在研究机构和商业组织中,已经对尼泊尔进行了大量的计算语言学工作。但是,该国还有另外94种土著语言,这种情况并不好。为了将计算语言学方法应用于一种语言,它必须首先在计算机中表示,但是尼泊尔的大多数语言都没有书面传统,更不用说计算机的任何支持了。这是完整的计算过程所需要的书面形式,正是在这里我们遇到了障碍或充其量是不适当的妥协。我们将看看尼泊尔的情况,而忽略主要讲人口位于尼泊尔境外的17种跨境语言。我们只剩下三种具有书面传统的语言:尼泊尔语,服务良好;纽瓦里具有1000多年的书面传统,但到目前为止,它在尝试对其文字进行编码时感到受挫;以及林布,尽管其文字进行了编码,但存在缺陷。其余许多语言可能都是用De-vanagari编写的,但渴望与他们的语言相关的东西有所不同,并且在视觉上与众不同,以标记他们的身份。我们研究如何对其余的语言进行处理,并推测通用书写系统和编码是否可以覆盖尼泊尔的所有语言。不可避免地,我们必须专注于当前的计算机写作编码标准Unicode,但是我们发现,尽管尼泊尔的语言活动家对这种技术可能没有足够的了解,并在Unicode内追求不必要或无益的目标,但外部专家他们对所涉及的所有问题以及生活语言及其用户的要求的了解有限,而是追求学术兴趣,从而为生活用户提供有限的支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号