首页> 外文期刊>International Journal on Document Analysis and Recognition >Knowledge-driven description synthesis for floor plan interpretation
【24h】

Knowledge-driven description synthesis for floor plan interpretation

机译:知识驱动的描述综合楼面计划解释

获取原文
获取原文并翻译 | 示例
       

摘要

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority.
机译:图像标题是AI的区域是一个广为名的问题。从楼层平面图图像的标题产生具有在室内路径规划,房地产和提供建筑解决方案中的应用。在文献中已经探索了几种方法,用于从楼层图像图像产生标题或半结构化描述。由于只有标题不足以捕获细粒细节,因此研究人员还提出了图像的描述性段落。但是,这些描述具有刚性结构和缺乏灵活性,使得在实时场景中难以使用它们。本文提供了两种型号,描述从图像提示(DSIC)和基于变压器的描述(TBDG)合成,用于从平面图图像中产生的文本。这两种模型利用了现代深度神经网络,用于视觉特征提取和文本生成。两种模型之间的差异在于它们从平面图图像中取出的方式。 DSIC模型仅占据深神经网络自动提取的视觉功能,而TBDG模型则使用段落中从输入楼层图像图像中提取的文本标题。在TBDG中生成的特定关键字并用段落理解它们使其在一般楼层平面图中更加强大。实验是在大型公共可公共数据集上进行的,并与最先进的技术进行比较,以显示所提出的模型的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号