首页> 外文学位 >A framework for multilingual information processing.
【24h】

A framework for multilingual information processing.

机译:多语言信息处理的框架。

获取原文
获取原文并翻译 | 示例

摘要

Recent and (continuing) rapid increases in computing power now enable more of humankind's written communication to be represented as digital data. The most recent and obvious changes in multilingual information processing have been the introduction of larger character sets encompassing more writing systems. Yet the very richness of larger collections of characters has made the interpretation and processing of text more difficult. The many competing motivations (satisfying the needs of linguists, computer scientists, and typographers) for standardizing character sets threaten the purpose of information processing: accurate and facile manipulation of data. Existing character sets are constructed without a consistent strategy or architecture. Complex algorithms and reports are necessary now to understand raw streams of characters representing multilingual text.; We assert that information processing is an architectural problem and not just a character set problem. We analyze several multilingual information processing algorithms (e.g., bidirectional reordering and character normalization) and we conclude that they are more dangerous than beneficial. The countless number of unexpected interactions suggest a lack of a coherent architecture. organizing them into a new architecture for multilingual information processing. We propose a multilayered architecture which we call Metacode where character sets appear in lower layers and protocols and algorithms in higher layers. We recast bidirectional reordering and character normalization in the Metacode framework.
机译:计算能力的最近和(持续)快速增长现在使更多的人类书面通信可以表示为数字数据。多语言信息处理的最新,最明显的变化是引入了包含更多书写系统的较大字符集。然而,大量字符的丰富性使得文本的解释和处理更加困难。标准化字符集的许多相互竞争的动机(满足语言学家,计算机科学家和排版者的需求)威胁着信息处理的目的:准确,便捷地处理数据。现有字符集的构建没有统一的策略或体系结构。现在需要复杂的算法和报告来理解代表多语言文本的原始字符流。我们断言信息处理是一个体系结构问题,而不仅仅是字符集问题。我们分析了几种多语言信息处理算法(例如,双向重新排序和字符归一化),并得出结论认为它们比有益更危险。数量众多的意外交互表明缺少一致的体系结构。将它们组织成用于多语言信息处理的新体系结构。我们提出了一种多层架构,我们称之为 Metacode ,其中字符集出现在较低层中,而协议和算法则出现在较高层中。我们在 Metacode 框架中重铸了双向重新排序和字符规范化。

著录项

  • 作者

    Atkin, Steven Edward.;

  • 作者单位

    Florida Institute of Technology.;

  • 授予单位 Florida Institute of Technology.;
  • 学科 Computer Science.; Language Linguistics.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 209 p.
  • 总页数 209
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;语言学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号