To our best knowledge, we are the first to propose a new Multi-Human Parsing task, corresponding datasets and baseline methods.
Multi-Human Parsing refers to partitioning a crowd scene image into semantically consistent regions belonging to the body parts or clothes items while differentiating different identities, such that each pixel in the image is assigned a semantic part label, as well as the identity it belongs to. A lot of higher-level applications can be founded upon Multi-Human Parsing, such as group behavior analysis, person re-identification, image editing, video surveillance, autonomous driving and virtual reality.
The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. Multi-Human Parsing is significantly different from traditional well-defined object recognition tasks, such as object detection, which only provides coarse-level predictions of object locations (bounding boxes); instance segmentation, which only predicts the instance-level mask without any detailed information on body parts and fashion categories; human parsing, which operates on category-level pixel-wise prediction without differentiating different identities. In real world scenario, the setting of multiple persons with interactions are more realistic and usual. Thus a task, corresponding datasets and baseline methods to consider both the fine-grained semantic information of each individual person and the relationships and interactions of the whole group of people are highly desired.
Please consider citing relevant papers:
"Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing"
Jian Zhao*, Jianshu Li*, Yu Cheng*, Li Zhou, Terence Sim, Shuicheng Yan, Jiashi Feng;
arXiv:1804.03287 (* indicates equal contribution)
"Multi-Human Parsing in the Wild"
Jianshu Li*, Jian Zhao*, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, Jiashi Feng;
arXiv:1705.07206 (* indicates equal contribution)
"Generative Partition Networks for Multi-Person Pose Estimation”
Xuecheng Nie, Jiashi Feng, Junliang Xing, Shuicheng Yan;
arXiv:1705.07422
The MHP v1.0 and v2.0 datasets are made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms.