YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals

Paper Code

Additional Comparison

We compare with SOTA text-to-3D generation while listing a) whether the guidance diffusion model was trained with 3D data, b) resolution of NeRF and c) time to generate a 3D object on a single NVIDIA A100 80 GB GPU. Since there is no pose guidance in the baselines we append ", full body" to the prompts for them. YouDream generates a 3D asset in ~40 minutes when used without the init stage, and ~50 minutes with init stage. We outperform MVDream even without utilizing any 3D objects for training diffusion model.

ProlificDreamer

trained on 3D objects

Res.: 512 x 512

Time: ~10 hrs (70k iters)

HiFA

trained on 3D objects

Res.: 512 x 512

Time: ~6 hrs (10k iters)

MVDream

trained on 3D objects

Res.: 64 x 64 -> 256 x 256

Time: ~40 mins (10k iters)

YouDream (ours)

trained on 3D objects

Res.: 128 x 128

Time: ~40/50 mins (10k/20k iters)

a toucan flying with wings spread out

a realistic mythical bird with two pairs of wings and two long thin lion-like tails

a six legged lioness, fierce beast, pouncing, ultra realistic, 4k

a tiger

a zoomed out DSLR photo of gold eagle statue

a giraffe

a giraffe with dragon wings

a brown horse

an elephant kicking a soccer ball

golden ball with wings