A method of visually correlating text and speech includes receiving a source file; generating, based on the source file, a page display image including a series of text segments, the generating including rendering the series of text segments with a first set of display characteristics; receiving an input signal representing an utterance; processing the received input signal to determine whether at least a portion of a text segment included within the generated page display image has been uttered; identifying the text segment determined to have been at least partially uttered; rendering the identified text segment with a second set of display characteristics; and enabling the generated page display image to be visually represented on an output device, wherein the identified text segment is rendered with the second set of display characteristics substantially simultaneously upon receiving the input signal.
展开▼