We present in this paper a system named PoG. Its role is to recognise and interpret natural pointing gestures in the context of a multimodal interaction. The user's hand gestures are tracked by a camera located above a building plan. The user points to a room on the plan with his index finger while using speech to ask for some information from the system. The PoG system is composed of an extraction process, which computes visual primitives, a recognition process, providing the name of the gesture, and a localisation process, which computes the coordinates of the index tip.
展开▼