1Shanghai AI Laboratory
2NetEase Games AI Lab
3Beihang University
4Zhejiang University
This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothes editing and meanwhile lose fine-grained control over the whole generation process(e.g., specify the order of inside and outside of clothes). To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothes disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothes generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothes model from the human body. The proposed method, TELA, achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves better 3D clothed human generation than the holistic modeling method while also supporting cloth editing applications such as virtual try-on.