TELA: Text to Layer-wise 3D Clothed Human Generation

Junting Dong¹, Qi Fang², Zehuan Huang³, Xudong Xu¹, Jingbo Wang¹, Sida Peng⁴, Bo Dai¹

¹Shanghai AI Laboratory ²NetEase Games AI Lab ³Beihang University ⁴Zhejiang University

Abstract

This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothes editing and meanwhile lose fine-grained control over the whole generation process(e.g., specify the order of inside and outside of clothes). To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothes disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothes generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothes model from the human body. The proposed method, TELA, achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves better 3D clothed human generation than the holistic modeling method while also supporting cloth editing applications such as virtual try-on.