We compare with SOTA text-to-3D generation while listing a) whether the guidance diffusion model was trained with 3D data, b) resolution of NeRF and c) time to generate a 3D object on a single NVIDIA A100 80 GB GPU. Since there is no pose guidance in the baselines we append ", full body" to the prompts for them. YouDream generates a 3D asset in ~40 minutes when used without the init stage, and ~50 minutes with init stage. We outperform MVDream even without utilizing any 3D objects for training diffusion model.
ProlificDreamer
trained on 3D objects ❌
Res.: 512 x 512
Time: ~10 hrs (70k iters)
HiFA
trained on 3D objects ❌
Res.: 512 x 512
Time: ~6 hrs (10k iters)
MVDream
trained on 3D objects ✓
Res.: 64 x 64 -> 256 x 256
Time: ~40 mins (10k iters)
YouDream (ours)
trained on 3D objects ❌
Res.: 128 x 128
Time: ~40/50 mins (10k/20k iters)