What Goes Into A Word: Generating Image Descriptions With Top-Down Spatial Knowledge

机译：一言以蔽之：利用自上而下的空间知识生成图像描述

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Generating grounded image descriptions requires associating linguistic units with their corresponding visual clues. A common method is to train a decoder language model with attention mechanism over convolutional visual features. Attention weights align the stratified visual features arranged by their location with tokens, most commonly words, in the target description. However, words such as spatial relations (e.g. next to and under) are not directly referring to geometric arrangements of pixels but to complex geometric and conceptual representations. The aim of this paper is to evaluate what representations facilitate generating image descriptions with spatial relations and lead to better grounded language generation. In particular, we investigate the contribution of four different representational modalities in generating relational referring expressions: (ⅰ) (pre-trained) convolutional visual features, (ⅱ) spatial attention over visual features, (ⅲ) top-down geometric relational knowledge between objects, and (ⅳ) world knowledge captured by contextual em-beddings in language models.

机译：生成扎实的图像描述需要将语言单元与其相应的视觉线索相关联。一种常见的方法是通过卷积视觉特征上的注意力机制训练解码器语言模型。注意权重将按其位置排列的分层视觉特征与目标描述中的标记（通常是单词）对齐。然而，诸如空间关系（例如，在其附近和在其下）之类的词并不直接指代像素的几何布置，而是指复杂的几何和概念表示。本文的目的是评估哪些表示形式有助于生成具有空间关系的图像描述，并导致更好的扎实的语言生成。特别是，我们调查了四种不同的表示形式在生成关系引用表达式中的作用：（ⅰ）（预训练的）卷积视觉特征，（ⅱ）视觉特征上的空间注意力，（ⅲ）对象之间自上而下的几何关系知识以及（ⅳ）语言模型中上下文嵌入所捕获的世界知识。

著录项

来源
《International natural language generation conference》|2019年|540-551|共12页
会议地点
作者
Mehdi Ghanimifard; Simon Dobnik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Spatial Object Detection and Recognition on Satellite Images Using ?Priori Knowledge? by Creating Bag-of-Words [J] . I. Muthulakshmi Asian Journal of Information Technology . 2016,第6期

机译：利用“先验知识”对卫星图像进行空间目标检测和识别通过创建单词袋
2. When a graph is poorer than 100 words: A comparison of computerised natural language generation, human generated descriptions and graphical displays in neonatal intensive care [J] . Marian van der Meulen, Robert H. Logie, Yvonne Freer, Applied Cognitive Psychology . 2010,第1期

机译：当图形不足100个单词时：新生儿重症监护中计算机化自然语言生成，人工生成的描述和图形显示的比较
3. An image worth a thousand words? Expressions of stakeholder identity perspectives in place image descriptions [J] . Carola Strandberg, Maria Ek Styven Journal of place management and development . 2021,第3期

机译：一个胜过千言万语的图像？利益相关者身份观点的表达式图像描述
4. What Goes Into A Word: Generating Image Descriptions With Top-Down Spatial Knowledge [C] . Mehdi Ghanimifard, Simon Dobnik International natural language generation conference . 2019

机译：进入一句话：通过自上而下的空间知识生成图像描述
5. TREE: A knowledge-based system for generating images of trees. [D] . Bosanac, Bojana. 1990

机译：TREE：用于生成树木图像的基于知识的系统。
6. Using affective knowledge to generate and validate a set of emotion-related action words [O] . Emma Portch, Jelena Havelka, Charity Brown, -1

机译：使用情感知识来生成和验证一组与情感相关的动作词
7. ABSTRACT Various body parts or organs can be analysed to identify the different diseases in the human body. Fingernail analysis is one of the ways to identify disease in the human body. Nails are the body part which are farthest from the heart and therefore receive oxygen at last. As a result the nails are the first who show the symptoms of a disease in the human body. Fingernails can be easily captured for diagnosis and there are no heavy equipment or no specific conditions required to use nail image for disease diagnosis, like in other tests and scanning processes. Human nails deliver beneficial information about complaints or any nutritive imbalances in the human body depending upon their shape, texture and colour. In human beings, numerous systemic and skin diseases can be easily analyzed through careful examination of nails of both the limbs. A lot of nail illnesses have been found to be primary signs of numerous underlying systemic illnesses. The colour, texture or shape changes in nails are signs of many diseases mainly affecting nails. Considering all these properties of nails a system is proposed that uses digital image processing (DIP) methods for identifying such changes in the human nail to get more precise results and predict numerous diseases effortlessly. With the emerging Internet of Things (IOT) concept the generated report is made available remotely, this will help users to reduce transportation efforts. As the system has to deal with large and private data, the security of data must be ensured. To keep the data confidential, the Blockchain concept which is one of the most emerging concepts in the field of data management is used. The paper contains the implementation of the digital image processing for feature extraction of nail images, usage of IOT (ThingSpeak cloud) for data storage and implementation of Blockchain to keep the system secured and theft free. KEY WORDS: Int ernet of thin gs (IOT), Image proc essin g, Thin gSpeak, RG B vavalues, Mean pi xel vavalues, Bloc kchain , Hash key. Disease Diagnostic System: Abnormalities in Human Nail [O] . Pranav S. Wazarkar 2020

机译：摘要的各个身体部位或器官可被分析以识别在人体内的不同的疾病。指甲分析来识别人体疾病的方法之一。指甲是身体一部分是离心脏最远，因此在最后接受氧气。作为结果，指甲是第一谁表现出人体疾病的症状。指甲可以容易地捕获用于诊断和没有重装或需要使用指甲图像用于疾病诊断，比如在其他测试和扫描过程没有特定的条件。人的指甲提供有关投诉或取决于它们的形状，纹理和色彩在人体内的任何营养失衡有益的信息。在人类中，许多全身性皮肤疾病是可以很容易地通过两个四肢指甲的仔细检查分析。很多指甲病已发现众多潜在系统性疾病的主要症状。在指甲的颜色，质地和形状的变化是许多疾病主要影响指甲的迹象。考虑到所有的指甲的这些性能的系统被提出，用于识别人指甲这样的变化以获得更精确的结果，并毫不费力预测许多疾病用途的数字图像处理（DIP）方法。随着物联网（IOT）的概念，新兴的互联网将生成的报告提供远程，这将帮助用户降低运输工作。由于系统必须处理大量的私人数据，数据的安全性必须得到保证。为了保持数据的机密性，使用Blockchain的概念，它是在数据管理领域的大多数新兴的概念之一。本文包含了数字图像处理的指甲图像，IOT（ThingSpeak云）的使用为数据存储和执行Blockchain的特征提取的执行，以保持固定的系统和盗窃免费。关键词：诠释薄GS（IOT），图像的ERNET PROC essin克，薄型gSpeak，RG乙vavalues，平均数PI XEL vavalues，阵营kchain，哈希密钥。疾病诊断系统：在人类指甲异常

What Goes Into A Word: Generating Image Descriptions With Top-Down Spatial Knowledge

摘要

著录项

相似文献

相关主题

期刊订阅