A tagged corpus plays an important role in natural language processing based on a stochastic language model and increasing the corpus size improves the accuracy. It is, however, necessary for a meaningful improvement to increase a corpus size more than exponentially and an annotation cost needed for it is not negligible. In this paper, we discuss the usage of an untagged corpus. In the experiments, using an untagged corpus improved the predictive power of a stochastic language model and the accuracy of a kana-kanji converter based on it. But for a tagger the improvement was slight.
展开▼