Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets

机译：结合深度学习和串核，以便瑞士德国推文本地化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we introduce the methods proposed by the UnibucKernel team in solving the Social Media Variety Geolocation task featured in the 2020 VarDial Evaluation Campaign. We address only the second subtask, which targets a data set composed of nearly 30 thousand Swiss German Jodels. The dialect identification task is about accurately predicting the latitude and longitude of test samples. We frame the task as a double regression problem, employing a variety of machine learning approaches to predict both latitude and longitude. From simple models for regression, such as Support Vector Regression, to deep neural networks, such as Long Short-Term Memory networks and character-level convolutional neural networks, and, finally, to ensemble models based on meta-learners, such as XGBoost, our interest is focused on approaching the problem from a few different perspectives, in an attempt to minimize the prediction error. With the same goal in mind, we also considered many types of features, from high-level features, such as BERT embeddings, to low-level features, such as characters n-grams, which are known to provide good results in dialect identification. Our empirical results indicate that the handcrafted model based on string kernels outperforms the deep learning approaches. Nevertheless, our best performance is given by the ensemble model that combines both handcrafted and deep learning models.

机译：在这项工作中，我们介绍了Unibucknel团队在解决2020年的Vardial评估活动中的社交媒体品种地理定位任务方面提出的方法。我们只解决了第二个子任务，它针对一个由近3万瑞士德国杰德尔组成的数据集。方言识别任务是准确地预测测试样本的纬度和经度。我们将任务框架作为双重回归问题，采用各种机器学习方法来预测纬度和经度。从简单模型进行回归，如支持向量回归，深度神经网络，如长的短期内存网络和字符级卷积神经网络，而且最后，基于元学习者的集合模型，如XGBoost，我们的兴趣是专注于从几个不同的角度来接近问题，以便最大限度地减少预测误差。考虑到同样的目标，我们还考虑了许多类型的特征，从高级功能（如BERT Embeddings）到低级功能，如字符N-GRAM，已知在方言识别中提供良好的结果。我们的经验结果表明，基于串核的手工制作模型优于深度学习方法。尽管如此，我们的最佳性能是由集合模型给出的，这些模型结合了手工制作和深度学习模型。

著录项

来源
《Workshop on NLP for Similar Languages, Varieties and Dialects》|2020年|242-253|共12页
会议地点
作者
Mihaela Gaman; Radu Tudor Ionescu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Combining the Kernel Collaboration Representation and Deep Subspace Learning for Facial Expression Recognition [J] . Sun Zhe, Hu Zheng-Ping, Chiong Raymond, Journal of Circuits, Systems, and Computers . 2018,第8期

机译：结合内核协作表示和深度子空间学习进行面部表情识别
2. Enhancing PIR-Based Multi-Person Localization Through Combining Deep Learning With Domain Knowledge [J] . Tianye Yang, Peng Guo, Wenyu Liu, Sensors Journal, IEEE . 2021,第4期

机译：通过将深度学习与域知识相结合，增强基于PIR的多人定位
3. Combining deep learning with geometric features for image-based localization in the Gastrointestinal tract [J] . Song Jingwei, Patel Mitesh, Girgensohn Andreas, Expert systems with applications . 2021,第Deca期

机译：结合深度学习与几何特征在胃肠道中基于图像的定位的几何特征
4. Combining Topic Models and String Kernel for Deep Web Categorization [C] . Guangyue Xu, Weimin Zheng, Haiping Wu, International Conference on Fuzzy Systems and Knowledge Discovery . 2010

机译：组合主题模型和字符串内核，用于深网络分类
5. Disaster Tweet Text and Image Analysis Using Deep Learning Approaches [D] . Li, Xukun. 2020

机译：灾难推文使用深度学习方法的文本和图像分析
6. Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning [O] . Xuefeng Peng, Yi Ding, David Wihl, 2018

机译：通过结合深度学习和基于内核的强化学习来改善脓毒症治疗策略
7. Multi-stage Jamming Attacks Detection using Deep Learning Combined with Kernelized Support Vector Machine in 5G Cloud Radio Access Networks [O] . Marouane Hachimi, Georges Kaddoum, Ghyslain Gagnon, 2020

机译：使用深度学习的多级干扰攻击检测与5G云无线电接入网络中的京钟化支持向量机相结合

Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets

摘要

著录项

相似文献

相关主题

期刊订阅