Bilingual speakers are known for their ability to code-switch or mix their languages during communication. This phenomenon occurs when bilinguals substitute a word or phrase from one language with a phrase or word from another language. For code-switching speech recognition, it is essential to collect a large-scale code-switching speech database for model training. In order to ease the negative effect caused by the data sparseness problem in training code-switching speech recognizers, this study proposes a data-driven approach to phone set construction by integrating acoustic features and cross-lingual context-sensitive articulatory features into distance measure between phone units. KL-divergence and a hierarchical phone unit clustering algorithm are used in this study to cluster similar phone units to reduce the need of the training data for model construction. The experimental results show that the proposed method outperforms other traditional phone set construction methods.
展开▼