This paper describes a new shared task for humor understanding that attempts to eschew the ubiquitous binary approach to humor detection and focus on comparative humor ranking instead. The task is based on a new dataset of funny tweets posted in response to shared hashtags, collected from the 'Hashtag Wars' segment of the TV show @midnight. The results are evaluated in two subtasks that require the participants to generate either the correct pair-wise comparisons of tweets (subtask A), or the correct ranking of the tweets (subtask B) in terms of how funny they are. 7 teams participated in subtask A, and 5 teams participated in subtask B. The best accuracy in subtask A was 0.675. The best (lowest) rank edit distance for subtask B was 0.872.
展开▼