これまで約40年間にわたって、音声認識に関する研究を行ってきた。この間に統計的手法をべースに、音声認識技術は大きく進歩したが、まだ人の能力には遠く及ばない。最近は、学会や学会誌で発表される技術の進歩がやや飽和しており、人の能力に近付く道筋が見えない。現在の研究のアプローチには、何かが欠けているように思われる。それが何かは明確でないが、現在の単純な枠組みではなく、多数のレベルの多様な知識を最適に組み合わせて着実な認識へ導く、統計的な知識処理のフレームワークを構築する必要があるように思われる。%I have been working on automatic speech recognition (ASR) research for almost 40 years. Although ASR technology has made significant progress based on statistical techniques during this period, its performance is still far below that of human beings. Technological progress reported at conferences and in journals has recently begun to saturate, and it is unclear how we can best continue to approach human levels of accuracy. It seems that something is missing in the approach of current research. Although it is unclear exactly what is missing, it seems we need to construct a framework for statistical knowledge processing which can achieve reliable recognition by optimally combining various knowledge resources modeled at many levels.
展开▼