Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

Ikegami Yukino; Tsuruta Setsuo

首页> 外文期刊>Multimedia Tools and Applications >Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

【24h】

Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

机译：使用基于N-gram的二进制分类和字典的无模式日语输入的混合方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid growth of globalization requires handling a large number of multilingual documents, where Japanese input co-exist with English and other languages, which use the Roman alphabet. Conventional methods for Japanese input require Japanese users to switch the input mode between Japanese and the Latin alphabet. As current solution, there is a modeless Japanese input method that automatically switches the input mode. However, those need training with a large amount of text data for improving the performance. This paper proposes a hybrid modeless Japanese input method that is based on the non-Japanese word dictionary and n-gram character sequence features to decide whether to convert and switch to Kana input or not. The aim of using the non-Japanese word dictionary is decreasing false positive against non-Japanese language words. This dictionary is composed by text data available on the Web. The n-gram based discriminative model are learned by a Support Vector Machine from a balanced corpus, which contains various domain texts. The evaluation of our method has shown that its statistical accuracy according to F-measure for prediction of non-Kana characters improves 7.7 % compared to n-gram only based method. In addition, the real user test has shown the average value of inputted time was agreeside for our method, against disagree side for conventional Japanese input method that requires switching input mode.

机译：全球化的迅速发展要求处理大量的多语言文档，其中日语输入与使用罗马字母的英语和其他语言共存。传统的日语输入法要求日语用户在日语和拉丁字母之间切换输入模式。作为当前解决方案，有一种无模式的日语输入法可以自动切换输入模式。但是，这些人员需要接受大量文本数据的培训才能提高性能。本文提出了一种基于非日语单词词典和n-gram字符序列特征的混合无模式日语输入法，以决定是否转换并切换到假名输入。使用非日语单词词典的目的是减少针对非日语单词的误报。该词典由网络上可用的文本数据组成。支持向量机从平衡语料库中学习基于n元语法的判别模型，该语料库包含各种领域文本。对我们方法的评估表明，与仅基于n-gram的方法相比，根据F度量进行非假名字符预测的统计准确性提高了7.7％。另外，真实用户测试表明，对于我们的方法，输入时间的平均值是一致的，而对于传统的要求切换输入模式的日语输入法，不一致的是一致的。

著录项

来源
《Multimedia Tools and Applications》 |2015年第11期|3933-3946|共14页
作者
Ikegami Yukino; Tsuruta Setsuo;
展开▼
作者单位

Tokyo Denki Univ, Inzai, Chiba, Japan;

Tokyo Denki Univ, Inzai, Chiba, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multilingual documents; Modeless Japanese input;

机译：多语言文档;无日语输入;

相似文献

外文文献
中文文献
专利

1. Improved Error Reduction and Hybrid Input Output Algorithms for Phase Retrieval by including a Sparse Dictionary Learning-Based Inpainting Method [J] . Jian-Jia Su, Chung-Hao Tien International Journal of Optics . 2020,第3期

机译：通过包括稀疏字典基于学习的初始化方法，改进了相位检索的误差减少和混合输入输出算法
2. Facing the classification of binary problems with a hybrid system based on quantum-inspired binary gravitational search algorithm and K-NN method [J] . XiaoHong Han, Long Quan, XiaoYan Xiong, Engineering Applications of Artificial Intelligence . 2013,第10期

机译：基于量子启发式二进制重力搜索算法和K-NN方法的混合系统面临的二进制问题分类
3. Using Hybrid and Diversity-Based Adaptive Ensemble Method for Binary Classification [J] . Xing Fan, Chung-Horng Lung, Samuel A. Ajila International Journal of Intelligence Science . 2018,第3期

机译：基于混合和分集的自适应集成方法进行二进制分类
4. Flick: Japanese Input Method Editor Using N-Gram and Recurrent Neural Network Language Model Based Predictive Text Input [C] . Yukino Ikegami, Yoshitaka Sakurai, Ernesto Damiani, International Conference on Signal-Image Technology and Internet-Based Systems . 2017

机译：Flick：使用N-Gram和基于递归神经网络语言模型的日语输入法编辑器的预测文本输入
5. Classification and variable selection for high dimensional multivariate binary data: Adaboost based new methods and a theory for the plug-in rule. [D] . Park, Junyong. 2006

机译：高维多元二进制数据的分类和变量选择：基于Adaboost的新方法和插件规则的理论。
6. miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM [O] . Jiandong Ding, Shuigeng Zhou, Jihong Guan 2011

机译：miRFam：一种基于n-gram和多类SVM的有效的自动miRNA分类方法
7. Evaluation method for the Japanese Kanji dictionary using frequency information －For knowledge base kanji input syetem－ [O] . Yukio Hori, Masaya Ikemura 2001

机译：使用频率信息的日本汉字词典的评估方法 - 从知识库kanji输入syetem-

Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

摘要

著录项

相似文献

相关主题

期刊订阅