Language technology is biased toward English newswire. In POS tagging, we get 97-98 words right out of a 100 in English newswire, but results drop to about 8 out of 10 when running the same technology on Twitter data. In dependency parsing, we are able to identify the syntactic head of 9 out of 10 words in English newswire, but only 6-7 out of 10 in tweets. Replace references to Twitter with references to a low-resource language of your choice, and the above sentence is still likely to hold true.
展开▼