Deep Text Mining of Instagram Data without Strong Supervision

机译：Instagram数据的深层文本挖掘，无需强力监督

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the advent of social media, our online feeds increasingly consist of short, informal, and unstructured text. This textual data can be analyzed for the purpose of improving user recommendations and detecting trends. Instagram is one of the largest social media platforms, containing both text and images. However, most of the prior research on text processing in social media is focused on analyzing Twitter data, and little attention has been paid to text mining of Instagram data. Moreover, many text mining methods rely on annotated training data, which in practice is both difficult and expensive to obtain. In this paper, we present methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain. In this context, we analyze a corpora of Instagram posts from the fashion domain, introduce a system for extracting fashion attributes from Instagram, and train a deep clothing classifier with weak supervision to classify Instagram posts based on the associated text. With our experiments, we confirm that word embeddings are a useful asset for information extraction. Experimental results show that information extraction using word embeddings outperforms a baseline that uses Levenshtein distance. The results also show the benefit of combining weak supervision signals using generative models instead of majority voting. Using weak supervision and generative modeling, an F₁ score of 0.61 is achieved on the task of classifying the image contents of Instagram posts based solely on the associated text, which is on level with human performance. Finally, our empirical study provides one of the few available studies on Instagram text and shows that the text is noisy, that the text distribution exhibits the long-tail phenomenon, and that comment sections on Instagram are multi-lingual.

机译：随着社交媒体的出现，我们的在线提要越来越多地包含简短，非正式和非结构化的文本。可以分析这些文本数据，以改善用户推荐并检测趋势。 Instagram是最大的社交媒体平台之一，同时包含文本和图像。但是，先前有关社交媒体中文本处理的大多数研究都集中在分析Twitter数据上，并且很少关注Instagram数据的文本挖掘。而且，许多文本挖掘方法都依赖于带注释的训练数据，实际上，获取训练数据既困难又昂贵。在本文中，我们提出了从Instagram文本中无监督地挖掘时尚属性的方法，这些方法可以在时尚领域中实现一种新的用户推荐。在这种情况下，我们分析了来自时尚领域的一系列Instagram帖子，引入了从Instagram提取时尚属性的系统，并训练了一个在监督不力的情况下对服装进行分类的深层分类器，以根据相关文本对Instagram帖子进行分类。通过我们的实验，我们确认单词嵌入是信息提取的有用资产。实验结果表明，使用单词嵌入的信息提取优于使用Levenshtein距离的基线。结果还显示了使用生成模型而不是多数表决将弱监管信号组合在一起的好处。使用弱监督和生成模型，F _{1
仅基于相关文本对Instagram帖子的图像内容进行分类的任务就达到了0.61，这与人类的表现水平相当。最后，我们的实证研究提供了有关Instagram文本的少数可用研究之一，它表明文本很嘈杂，文本分布表现出长尾现象，Instagram上的注释部分是多语言的。}

著录项

来源
《IEEE/WIC/ACM International Conference on Web Intelligence》|2018年|158-165|共8页
会议地点
作者
Kim Hammar; Shatha Jaradat; Nima Dokoohaki; Mihhail Matskin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Twitter; Ontologies; Task analysis; Clothing; Text mining;

机译：Twitter;本体;任务分析;服装;文本挖掘;

相似文献

外文文献
中文文献
专利

1. Deep text classification of Instagram data using word embeddings and weak supervision [J] . Hammar Kim, Jaradat Shatha, Dokoohaki Nima, Web Intelligence . 2020,第1期

机译：使用Word Embeddings和弱监管的Instagram数据的深文本分类
2. Mining relational data from text: From strictly supervised to weakly supervised learning [J] . Zhu Zhang Information Systems . 2008,第3期

机译：从文本中挖掘关系数据：从严格监督到弱监督学习
3. Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database for Automated Image Interpretation [J] . Hoo-Chang Shin, Le Lu, Lauren Kim, Journal of machine learning research . 2016,第107期

机译：大规模放射学数据库上的交错文本/图像深度挖掘，用于自动图像解释
4. Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data [C] . Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, IEEE International Symposium on Biomedical Imaging . 2020

机译：没有完整采样数据的基于自我监督的基于物理的深度学习MRI重建
5. Developing a Data Mining Framework to Identify a Sense of Gentrification through Social Media Data: A Case Study Using Instagram Posts in Salt Lake City, Utah [D] . Huang, Cheng-Chia. 2017

机译：开发数据挖掘框架以通过社交媒体数据识别绅士主义感：以犹他州盐湖城的Instagram帖子为例的研究
6. Deep echocardiography: data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease [O] . Ali Madani, Jia Rui Ong, Anshul Tibrewal, 2018

机译：深度超声心动图：数据有效的监督和半监督深度学习可自动诊断心脏病
7. Data augmentation and semi-supervised learning for deep neural networks-based text classifier [O] . Heereen Shim, Stijn Luca, Dietwig Lowet, 2020

机译：基于深度神经网络的文本分类器的数据增强和半监督学习

Deep Text Mining of Instagram Data without Strong Supervision

摘要

著录项

相似文献

相关主题

期刊订阅