Research in social psychology has extensively shown that in cohesive groups, individuals often mirror each other’s prosody, facial expressions, and body movements. This mirroring effect can help determine the level of comfort or the extent of engagement and genuine interest between two or more interlocutors. In this work, using an annotated dataset consisting of videos of three-person conversations, we aim to analyze the extent of rapport in each of the triadic groups. We generate behavioral curves from features extracted from the participants’ face and body movements. These are the sampled time series signals resulting from their multimodal features. Next, the extents of synchrony are analyzed by aligning the behavioral curves of pairs of participants. The alignment tests show that basic correlation coefficient measures outperform more advanced curve matching techniques when used to estimate the similarities between multidimensional behavior curves. They also show that in this dataset, synchrony is better observed from facial expressions than body movements. For this reason, using facial action units, we show that an end-to-end recursive neural network (RNN) trained using a regression loss yields good results in predicting the extent of synchrony in small groups.
展开▼