This paper describes Faheem (adj. of understand),our submission to NADI (Nuanced Arabic Dialect Identification) shared task. With so many Arabic dialects being understudied due to the scarcity of the resources,the objective is to identify the Arabic dialect used in the tweet,at the country-level. We propose a machine learning approach where we utilize word-level n-gram (n = 1 to 3) and tf-idf features and feed them to six different classifiers. We train the system using a data set of 21.000 tweets-provided by the organizers-covering twenty-one Arab countries. Our top performing classifiers are: Logistic Regression,Support Vector Machines,and Multinomial Naive Bayes (MNB). We achieved our best result of macro-F_1 = 0.151 using the MNB classifier.
展开▼