YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals

Additional Comparison

We compare with SOTA text-to-3D generation while listing a) whether the guidance diffusion model was trained with 3D data, b) resolution of NeRF and c) time to generate a 3D object on a single NVIDIA A100 80 GB GPU. Since there is no pose guidance in the baselines we append ", full body" to the prompts for them. YouDream generates a 3D asset in ~40 minutes when used without the init stage, and ~50 minutes with init stage. We outperform MVDream even without utilizing any 3D objects for training diffusion model.

ProlificDreamer

trained on 3D objects ❌

Res.: 512 x 512

Time: ~10 hrs (70k iters)

HiFA

trained on 3D objects ❌

Res.: 512 x 512

Time: ~6 hrs (10k iters)

MVDream

trained on 3D objects ✓

Res.: 64 x 64 -> 256 x 256

Time: ~40 mins (10k iters)