Multi-Human Parsing


To our best knowledge, we are the first to propose a new Multi-Human Parsing task, corresponding datasets and baseline methods.

Task Definition

Multi-Human Parsing refers to partitioning a crowd scene image into semantically consistent regions belonging to the body parts or clothes items while differentiating different identities, such that each pixel in the image is assigned a semantic part label, as well as the identity it belongs to. A lot of higher-level applications can be founded upon Multi-Human Parsing, such as group behavior analysis, person re-identification, image editing, video surveillance, autonomous driving and virtual reality.


The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. Multi-Human Parsing is significantly different from traditional well-defined object recognition tasks, such as object detection, which only provides coarse-level predictions of object locations (bounding boxes); instance segmentation, which only predicts the instance-level mask without any detailed information on body parts and fashion categories; human parsing, which operates on category-level pixel-wise prediction without differentiating different identities. In real world scenario, the setting of multiple persons with interactions are more realistic and usual. Thus a task, corresponding datasets and baseline methods to consider both the fine-grained semantic information of each individual person and the relationships and interactions of the whole group of people are highly desired.


Please consider citing relevant papers:

"Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing"  
Jian Zhao*, Jianshu Li*, Yu Cheng*, Li Zhou, Terence Sim, Shuicheng Yan, Jiashi Feng;
arXiv:1804.03287 (* indicates equal contribution)

"Multi-Human Parsing in the Wild"  
Jianshu Li*, Jian Zhao*, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, Jiashi Feng;
arXiv:1705.07206 (* indicates equal contribution)

"Generative Partition Networks for Multi-Person Pose Estimation”
Xuecheng Nie, Jiashi Feng, Junliang Xing, Shuicheng Yan;


  • April 03, 2018 Welcome to our CVPR'18 workshop on Visual Understanding of Humans in Crowd Scene and the 2nd Look Into Person (LIP) Challenge.
  • April 02, 2018 The Multi-Human Parsing and Pose Estimations Challenges are now open for submission.
  • April 01, 2018 The NUS LV Multiple-Human Parsing Dataset v2.0 is released!
  • March 31, 2018 The NUS LV Multiple-Human Parsing Dataset v1.0 is released!
  • March 29, 2018 We will organize a workshop at CVPR 2018.
  • March 27, 2018 The Multi-Human Parsing website is online!


The MHP v1.0 and v2.0 datasets are made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms.