Embodiments of the present disclosure may provide techniques for adjusting audio and/or video information of a video clip based at least in part on facial feature features and/or voice feature features extracted from hardware components. have. For example, in response to detecting a request to create an avatar video clip of a virtual avatar, a video signal and an audio signal associated with a face within the camera's field of view may be captured. Speech feature features and facial feature features may be extracted from the audio signal and the video signal, respectively. In some examples, in response to detecting a request to preview the avatar video clip, an adjusted audio signal may be generated based at least in part on the facial feature characteristics and the voice feature characteristics, and the adjusted audio A preview of the video clip of the virtual avatar using the signal may be displayed.
展开▼