The present disclosure discloses a method and system for sending media data in telepresence technology. By collecting audio or video data corresponding to a spatial area, the collected data is processed respectively to acquire multiple streams of data distinguished by a collecting spatial area and corresponding collecting spatial information; the multiple streams of data are encoded, encoded multiple streams of audio data and the collecting spatial information are packed together, and/or encoded multiple streams of video data and the collecting spatial information are packed together, and a packet including multiple streams of audio or video data is sent, respectively. The present disclosure also discloses a method and system for playing media data in telepresence technology. With the methods and systems, it is possible to directly identify the corresponding collecting spatial area, i.e., a corresponding playing location in the process of data transmission, which enables a conferee at a receiving end to experience auditory positioning and an immersive sensation. Further, the problem of synchronization among the multiple streams of audio or video data is solved effectively.
展开▼