Multi-Human Parsing

Originality

To our best knowledge, we are the first to propose a new Multi-Human Parsing task, corresponding datasets and baseline methods.

Task Definition

Multi-Human Parsing refers to partitioning a crowd scene image into semantically consistent regions belonging to the body parts or clothes items while differentiating different identities, such that each pixel in the image is assigned a semantic part label, as well as the identity it belongs to. A lot of higher-level applications can be founded upon Multi-Human Parsing, such as group behavior analysis, person re-identification, image editing, video surveillance, autonomous driving and virtual reality.

Motivation

The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. Multi-Human Parsing is significantly different from traditional well-defined object recognition tasks, such as object detection, which only provides coarse-level predictions of object locations (bounding boxes); instance segmentation, which only predicts the instance-level mask without any detailed information on body parts and fashion categories; human parsing, which operates on category-level pixel-wise prediction without differentiating different identities. In real world scenario, the setting of multiple persons with interactions are more realistic and usual. Thus a task, corresponding datasets and baseline methods to consider both the fine-grained semantic information of each individual person and the relationships and interactions of the whole group of people are highly desired.

Citation

Please consider citing relevant papers:

"Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing"
Jian Zhao*, Jianshu Li*, Yu Cheng*, Li Zhou, Terence Sim, Shuicheng Yan, Jiashi Feng;
arXiv:1804.03287 (* indicates equal contribution)

"Multi-Human Parsing in the Wild"
Jianshu Li*, Jian Zhao*, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, Jiashi Feng;
arXiv:1705.07206 (* indicates equal contribution)

"Generative Partition Networks for Multi-Person Pose Estimation”
Xuecheng Nie, Jiashi Feng, Junliang Xing, Shuicheng Yan;
arXiv:1705.07422

News

April 03, 2018 Welcome to our CVPR'18 workshop on Visual Understanding of Humans in Crowd Scene and the 2nd Look Into Person (LIP) Challenge.
April 02, 2018 The Multi-Human Parsing and Pose Estimations Challenges are now open for submission.
April 01, 2018 The NUS LV Multiple-Human Parsing Dataset v2.0 is released!
March 31, 2018 The NUS LV Multiple-Human Parsing Dataset v1.0 is released!
March 29, 2018 We will organize a workshop at CVPR 2018.
March 27, 2018 The Multi-Human Parsing website is online!

License

The MHP v1.0 and v2.0 datasets are made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms.