Grounded semantics is typically learnt from utterance-level meaning representa tions (e.g., successful database retrievals, denoted objects in images, moves in a game). We explore learning word and ut terance meanings by continuous observa tion of the actions of an instruction fol lower (IF). While an instruction giver (IG) provided a verbal description of a config uration of objects, IF recreated it using a GUI. Aligning these GUI actions to sub-utterance chunks allows a simple maxi mum entropy model to associate them as chunk meaning better than just providing it with the utterance-final configuration. This shows that semantics useful for in cremental (word-by-word) application, as required in natural dialogue, might also be better acquired from incremental settings.
展开▼