We describe the evaluation of a system for automatic humor generation in Japanese. First we used the traditional "having jokes rated by evaluators" method, with a result of 3.3 on a scale from 1 (boring) to 5 (funny). To complement this evaluation and to see if 3.3 is "good enough" we entered a system generated performance in a competition with a ¥500,000 prize. We also generated responses to arbitrary requests from the audience at a live event. While we did not win the 500,000 yen we did reach the final, and both the performance and the real time generation were well received. Sentiment analysis of blogs covering the event also showed that our system compared well to the other teams. That the system could compete successfully against human made contributions indicates that the score of 3.3 is "good enough" for real world applications of the system.
展开▼