Speech data is an extremely rich and important source of information. However, we lack suitable methods for the semantic annotation of speech data. For instance, semantic role labeling (SRL) of speech that has been transcribed by an automated speech recognition (ASR) system is still an unsolved problem. SRL of ASR data is difficult and complex due to the absence of sentence boundaries, punctuation, grammar errors, words that are wrongly transcribed, and word deletions and insertions. In this paper we propose a novel approach to SRL of ASR data based on the following idea: (1) train the SRL system on data segmented into frames, where each frame consists of a predicate and its semantic roles without considering sentence boundaries; (2) label it with the semantics of PropBank roles; and to assist the above (3) train a part-of-speech (POS) tagger to work on noisy and error prone ASR data. Experiments with the OntoNotes corpus show improvements compared to the state-of-the-art SRL applied on ASR data.
展开▼