In this paper we report our efforts in data collection and performance evaluation in support of spoken dialogue system development.We describe two understanding metrics called query density and concept efficiency which can be interpreted on a perutterance basis, but which are measured over the course of a dialogue. We also describe the evaluation infrastructure we have developed to support off-line processing using our GALAXY client-server architecture [8]. We show how we have used these metrics and mechanisms as part of the development of a spoken dialogue system for air-travel information.
展开▼