The Hong Kong Polytechnic University, Hong Kong.
Yanghong Zhou is Ph.D. student studying at the Institute of Textiles and Clothing of The Hong Kong Polytechnic University. She graduated with an outstanding undergraduate student honour Bachelor degree (majoring in Mathematics and applied Mathematics) from Fujian Normal University in 2011, and a Master degree (majoring in Information and Communication Engineering) from University of Electronic Science and Technology of China in 2014. Her current research interests include object categorization, object detection and image parsing.
Human parsing, namely the decomposition of an image of human subject into semantic body/clothing regions, is important for general human-centric analysis, which is also an essential process enabling high-level applications, including fashion style reconginition and retrievals, human identifications, and human behaviour analysis [1-3]. The existing methods for human parsing using deep neural networks have a number of known drawbacks, e.g. not taking into account the limited capacity of deep learning techniques to delineate visual objects, labels confusions, very coarse output boundary, and so forth. In this paper, we propose a part-detection based and conditional random fields (CRFs) embedded deep neural network to address the problem. Firstly, a coarse semantic segmentation is conducted by utilizing a deep neural network. Secondly, a part detector is trained to produce class-specific scores for human parts and/or clothing item regions. Then, the outputs of the part detector are intergated to the deep neural network in order to optimize the feature learning in the deep neural network. Finally, to sharpen the boundaries and refine the segmentation results, CRFs-based probabilistic graphical modelling is incorporated into the deep neural network. In the meantime, the outputs from the part detector define the explicit higher order potentials that can in turn improve the CRFs. We comprehensively evaluation our method with two public datasets. The results demonstrate the effectiveness of our proposed framework in comparison to the state-of-the-art methods.