首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark
【24h】

Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark

机译:面向对象的研究:联合人体解析与姿势估计网络和新基准

获取原文
获取原文并翻译 | 示例

摘要

Human parsing and pose estimation have recently received considerable interest due to their substantial application potentials. However, the existing datasets have limited numbers of images and annotations and lack a variety of human appearances and coverage of challenging cases in unconstrained environments. In this paper, we introduce a new benchmark named "Look into Person (LIP)" that provides a significant advancement in terms of scalability, diversity, and difficulty, which are crucial for future developments in human-centric analysis. This comprehensive dataset contains over 50,000 elaborately annotated images with 19 semantic part labels and 16 body joints, which are captured from a broad range of viewpoints, occlusions, and background complexities. Using these rich annotations, we perform detailed analyses of the leading human parsing and pose estimation approaches, thereby obtaining insights into the successes and failures of these methods. To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality. Furthermore, we simplify the network to solve human parsing by exploring a novel self-supervised structure-sensitive learning approach, which imposes human pose structures into the parsing results without resorting to extra supervision. The datasets, code and models are available at http://www.sysu-hcp.net/lip/.
机译:人体解析和姿势估计由于其巨大的应用潜力,近来引起了人们的极大兴趣。但是,现有数据集的图像和注释数量有限,并且在不受限制的环境中缺乏各种人的外观和具有挑战性的案例的覆盖范围。在本文中,我们引入了一个新的基准,称为“ Look in Person(LIP)”,在可伸缩性,多样性和难度方面都取得了重大进步,这对于以人为中心的分析的未来发展至关重要。这个全面的数据集包含50,000多个带有19个语义部分标签和16个身体关节的精心注释的图像,这些图像是从广泛的观点,遮挡和背景复杂性中捕获的。使用这些丰富的注释,我们对领先的人类分析和姿势估计方法进行了详细的分析,从而深入了解了这些方法的成功与失败。为了进一步探索和利用这两个任务的语义相关性,我们提出了一种新颖的人工分析和姿态估计联合网络,以探索有效的上下文建模,该模型可以同时以极高的质量预测分析和姿态。此外,我们通过探索一种新颖的自我监督的结构敏感学习方法来简化网络以解决人的解析问题,该方法将人的姿势结构强加到解析结果中,而无需借助额外的监督。数据集,代码和模型可从http://www.sysu-hcp.net/lip/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号