Dialogue systems are artefacts that converse with human users in order to achieveudsome task. Each step of the dialogue requires understanding the user's input, decidingudon what to reply, and generating an output utterance. Although there areudmany ways to express any given content, most dialogue systems do not take linguisticudvariation into account in both the understanding and generation phases,udi.e. the user's linguistic style is typically ignored, and the style conveyed by theudsystem is chosen once for all interactions at development time. We believe thatudmodelling linguistic variation can greatly improve the interaction in dialogue systems,udsuch as in intelligent tutoring systems, video games, or information retrievaludsystems, which all require specific linguistic styles. Previous work has shown thatudlinguistic style affects many aspects of users' perceptions, even when the dialogueudis task-oriented. Moreover, users attribute a consistent personality to machines,udeven when exposed to a limited set of cues, thus dialogue systems manifest personalityudwhether designed into the system or not. Over the past few years, psychologistsudhave identified the main dimensions of individual differences in humanudbehaviour: the Big Five personality traits. We hypothesise that the Big Five provideuda useful computational framework for modelling important aspects of linguisticudvariation. This thesis first explores the possibility of recognising the user's personalityudusing data-driven models trained on essays and conversational data. We thenudtest whether it is possible to generate language varying consistently along eachudpersonality dimension in the information presentation domain. We present PERSONAGE:uda language generator modelling findings from psychological studies toudproject various personality traits. We use PERSONAGE to compare various generationudparadigms: (1) rule-based generation, (2) overgenerate and select and (3)udgeneration using parameter estimation models-a novel approach that learns toudproduce recognisable variation along meaningful stylistic dimensions without theudcomputational cost incurred by overgeneration techniques. We also present theudfirst human evaluation of a data-driven generation method that projects multipleudstylistic dimensions simultaneously and on a continuous scale.
展开▼