首页> 外文期刊>Multimedia Tools and Applications >Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary
【24h】

Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

机译:使用基于N-gram的二进制分类和字典的无模式日语输入的混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

The rapid growth of globalization requires handling a large number of multilingual documents, where Japanese input co-exist with English and other languages, which use the Roman alphabet. Conventional methods for Japanese input require Japanese users to switch the input mode between Japanese and the Latin alphabet. As current solution, there is a modeless Japanese input method that automatically switches the input mode. However, those need training with a large amount of text data for improving the performance. This paper proposes a hybrid modeless Japanese input method that is based on the non-Japanese word dictionary and n-gram character sequence features to decide whether to convert and switch to Kana input or not. The aim of using the non-Japanese word dictionary is decreasing false positive against non-Japanese language words. This dictionary is composed by text data available on the Web. The n-gram based discriminative model are learned by a Support Vector Machine from a balanced corpus, which contains various domain texts. The evaluation of our method has shown that its statistical accuracy according to F-measure for prediction of non-Kana characters improves 7.7 % compared to n-gram only based method. In addition, the real user test has shown the average value of inputted time was agreeside for our method, against disagree side for conventional Japanese input method that requires switching input mode.
机译:全球化的迅速发展要求处理大量的多语言文档,其中日语输入与使用罗马字母的英语和其他语言共存。传统的日语输入法要求日语用户在日语和拉丁字母之间切换输入模式。作为当前解决方案,有一种无模式的日语输入法可以自动切换输入模式。但是,这些人员需要接受大量文本数据的培训才能提高性能。本文提出了一种基于非日语单词词典和n-gram字符序列特征的混合无模式日语输入法,以决定是否转换并切换到假名输入。使用非日语单词词典的目的是减少针对非日语单词的误报。该词典由网络上可用的文本数据组成。支持向量机从平衡语料库中学习基于n元语法的判别模型,该语料库包含各种领域文本。对我们方法的评估表明,与仅基于n-gram的方法相比,根据F度量进行非假名字符预测的统计准确性提高了7.7%。另外,真实用户测试表明,对于我们的方法,输入时间的平均值是一致的,而对于传统的要求切换输入模式的日语输入法,不一致的是一致的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号