The rise of deep learning (DL) might seem initially to mark a low point for linguists hoping to learn from, and contribute to, the field of statistical NLP. In building DL systems, the decisive factors tend to be data, computational resources, and optimization techniques, with domain expertise in a supporting role. Nonetheless, at least for semantics and pragmatics, I argue that DL models are potentially the best computational implementations of linguists' ideas and theories that we've ever seen. At the lexical level, symbolic representations are inevitably incomplete, whereas learned distributed representations have the potential to capture the dense interconnections that exist between words, and DL methods allow us to infuse these representations with information from contexts of use and from structured lexical resources. For semantic composition, previous approaches tended to represent phrases and sentences in partial, idiosyncratic ways; DL models support comprehensive representations and might yield insights into flexible modes of semantic composition that would be unexpected from the point of view of traditional logical theories. And when it comes to pragmatics, DL is arguably what the field has been looking for all along: a flexible set of tools for representing language and context together, and for capturing the nuanced, fallible ways in which langage users reason about each other's intentions. Thus, while linguists might find it dispiriting that the day-to-day work of DL involves mainly fund-raising to support hyperparameter tuning on expensive machines, I argue that it is worth the tedium for the insights into language that this can (unexpectedly) deliver.
展开▼