首页> 外文会议>Conference on Computational Linguistics and Speech Processing >Design of an Input Method for Taiwanese Hokkien using Unsupervized Word Segmentation for Language Modeling Pierre Magistry
【24h】

Design of an Input Method for Taiwanese Hokkien using Unsupervized Word Segmentation for Language Modeling Pierre Magistry

机译:基于语言语言建模的非超前分词的台湾福建话输入法设计

获取原文

摘要

This paper presents the challenges and the methodology followed in the design of a new Input Method (IME) for the Taiwanese (Hokkien) language. We first describe the context, the motivations and some of the main issues related to the input of text in Taiwanese on modern computer systems and mobile devices. Then we present the available resources which our system is based on. We will describe the whole architecture of our system. But since the cornerstone of modern IME is the Language Model (LM), the main Natural Language Processing issue on which we will focus in this paper is the estimation of a LM in the case of this under-resourced language. The solution we propose to rely on unsupervised word segmentation which preserves some degree of ambiguity.
机译:本文介绍了针对台湾(福建)语言的新输入法(IME)设计中所面临的挑战和方法。我们首先介绍与台湾人在现代计算机系统和移动设备上输入文本有关的上下文,动机和一些主要问题。然后,我们介绍系统所基于的可用资源。我们将描述系统的整个体系结构。但是,由于现代IME的基础是语言模型(LM),因此,本文重点关注的主要自然语言处理问题是在这种资源不足的情况下对LM的估计。我们提出的解决方案依赖于无监督分词,该分词保留了一定程度的歧义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号