Despite being one of the most popularudtasks in lexical semantics, word similarityudhas often been limited to the Englishudlanguage. Other languages, even thoseudthat are widely spoken such as Spanish,uddo not have a reliable word similarityudevaluation framework. We put forwardudrobust methodologies for the extensionudof existing English datasets toudother languages, both at monolingual andudcross-lingual levels. We propose an automaticudstandardization for the constructionudof cross-lingual similarity datasets,udand provide an evaluation, demonstratingudits reliability and robustness. Based onudour procedure and taking the RG-65 wordudsimilarity dataset as a reference, we releaseudtwo high-quality Spanish and Farsiud(Persian) monolingual datasets, and fifteenudcross-lingual datasets for six languages:udEnglish, Spanish, French, German, Portuguese,udand Farsi.
展开▼