Object-based audio is an emerging representationudfor audio content, where content is represented in a reproductionformat-udagnostic way and thus produced once for consumption onudmany different kinds of devices. This affords new opportunitiesudfor immersive, personalized, and interactive listening experiences.udThis article introduces an end-to-end object-based spatial audioudpipeline, from sound recording to listening. A high-leveludsystem architecture is proposed, which includes novel audiovisualudinterfaces to support object-based capture and listenertrackedudrendering, and incorporates a proposed component forudobjectification, i.e., recording content directly into an object-basedudform. Text-based and extensible metadata enable communicationudbetween the system components. An open architecture for objectudrendering is also proposed.udThe system’s capabilities are evaluated in two parts. First,udlistener-tracked reproduction of metadata automatically estimatedudfrom two moving talkers is evaluated using an objectiveudbinaural localization model. Second, object-based scene captureudwith audio extracted using blind source separation (to remixudbetween two talkers) and beamforming (to remix a recording ofuda jazz group), is evaluated with perceptually-motivated objectiveudand subjective experiments. These experiments demonstrate thatudthe novel components of the system add capabilities beyondudthe state of the art. Finally, we discuss challenges and futureudperspectives for object-based audio workflows.
展开▼