We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word.The point is quire simple: different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana).Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model.The model can achieve 96.6
展开▼