首页>
外国专利>
Method and system for creating frugal speech corpus using internet resources and conventional speech corpus
Method and system for creating frugal speech corpus using internet resources and conventional speech corpus
展开▼
机译:利用互联网资源和常规语音语料创建节俭语音语料的方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A speech corpus creation method and system are disclosed. The method comprising identifying a publicly accessible first source of the first speech data and its corresponding first text transcription; extracting a second speech data of an accessible encoding format from the first speech data; extracting a second text transcription data with at least one encoding format from the first text transcription data; matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus; analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.
展开▼