
Microsoft Trellis 2 lets you turn a single image into a 1536³, fully textured, PBR 3D model in just one minute on high-end GPUs. Learn the step-by-step setup, pitfalls, and how to handle transparency.
Microsoft Trellis 2: From a Photo to a 1536³ 3D Asset in 60 Seconds
Published by Brav
Table of Contents
TL;DR
- I discovered Microsoft’s Trellis 2 can generate a 1536³ 3D mesh in about a minute on top GPUs.
- It’s the first open-source model that ships full PBR textures (up to 4096×4096) and true transparency out of the box.
- The only catch? You’ll need ≥24 GB VRAM; otherwise the generator will crash or produce artifacts.
- I’ll walk you through a clean, Docker-free installation, the exact inference command, and how to get alpha channel working in Blender.
- The table below compares Trellis 2 to Hunyuan 2.1 and SAM 3D on key metrics.
Why this matters
When I was building a game asset from a single photo, I expected a quick result. Instead I hit low resolution, dark textures, and a pipeline that required a server with 48 GB VRAM. Microsoft’s Trellis 2 promises 1536³ resolution, PBR materials, and transparency—exactly what I needed. The pain points? 24 GB VRAM requirement, slow generation, and manual alpha handling. This article is my battle log, turning those pain points into a step-by-step guide.
Core concepts
At the heart of Trellis 2 is O-Voxel—a field-free, sparse voxel grid that stores geometry and appearance in a single compact latent. The model also uses Structured Latent Representation (SLAT), a unified 3D latent that lets the network learn both shape and texture simultaneously. The result? A 16× compressed representation that cuts token usage by roughly half compared to classic voxel grids. The GitHub README explains that the O-Voxel representation is “field-free” and robust to complex structures, handling open surfaces, non-manifold geometry, and thin features without losing detail. Trellis 2’s training data includes ~500 K diverse objects and ~4 000 textures, giving it a richer appearance model than its competitors.
| Feature | Trellis 2 | Hunyuan 2.1 | SAM 3D |
|---|---|---|---|
| Max Mesh Resolution | 1536³ | 1024³ | 512³ |
| Generation Time (full 1536³) | ~60 s (H100) | ~17 s (H100) | ~3 s (H100) |
| VRAM Requirement | ≥24 GB | ~16 GB | ~8 GB |
| PBR Support | Full (Base Color, Roughness, Metallic, Opacity) | Limited | None |
| Transparency | Full (Alpha channel) | Partial | None |
How to apply it
1️⃣ Clone the repo
git clone https://github.com/microsoft/TRELLIS.2.git
cd TRELLIS.2
2️⃣ Create a conda environment and install dependencies
conda create -n trellis python=3.10
conda activate trellis
pip install -e .
3️⃣ Download the 4-B checkpoint
wget https://github.com/microsoft/TRELLIS.2/releases/download/v0.1/TRELLIS-image-large.ckpt
4️⃣ Run inference on an image
python inference.py --image ./samples/duck.jpg \
--output ./duck_glb.glb \
--resolution 1536 \
--output-format glb
On an H100 it takes ~60 seconds; on an RTX 4090 it takes ~2 minutes.
5️⃣ Open the GLB in Blender The file is exported in OPAQUE mode by default, so the alpha channel is invisible.
- In the Shader Editor, add a Texture node.
- Drag the Alpha output to the Alpha input of the Principled BSDF. Now the transparent parts show correctly.
Pitfalls & edge cases
- 24 GB VRAM: Trellis 2 will crash on GPUs with less memory. On an A100 or H100 the memory just fits; on an RTX 3070 it will run out.
- Generation speed: 60 s on H100, 2 minutes on RTX 4090, 3–4 minutes on consumer GPUs.
- Bugs: Many users report crashes on Hugging Face and Docker due to missing shared libraries. The GitHub issue tracker lists several workarounds.
- S-Flat decompression artifacts: When decompressing the S-Flat latent, mesh artifacts sometimes appear. Re-running with –decompress-mode fast reduces them.
- Alpha channel: GLB export is OPAQUE; you must manually enable it in Blender or re-export with an alpha flag.
Quick FAQ
Q1: Will Microsoft release a version with lower VRAM requirements? A: Not yet, but community-driven quantization efforts may reduce memory usage in future releases.
Q2: How does O-Voxel compare to voxel grids in terms of token efficiency? A: O-Voxel is field-free and uses sparse occupancy, cutting token usage by ~50% compared to dense voxel grids.
Q3: Can I run it on consumer GPUs like RTX 4090? A: Yes, 1536³ can be generated on RTX 4090 but it will be slower and may require memory tweaks.
Q4: Does it support multi-GPU inference? A: Currently not; future releases may support distributed inference.
Q5: Are there any cloud inference options? A: Microsoft hasn’t announced a cloud API; you’ll need to run locally or self-host.
Q6: How do I get transparency working in Blender? A: Connect the texture’s alpha channel to the material’s Alpha input.
Q7: What file formats can I export besides GLB? A: OBJ, PLY, Radiance Fields, and 3D Gaussians are available.
Conclusion
Trellis 2 is a game-changer for artists, game developers, and VFX teams who need high-resolution, PBR-ready 3D assets from a single image. The trade-off is the heavy VRAM requirement and the occasional crash on low-memory GPUs. If you have a 24 GB or higher GPU, the learning curve is minimal: a few terminal commands and a Blender tweak, and you’re ready to generate production-ready assets. For the rest of us who only have 8–12 GB, the model remains a bit out of reach, but the community is working on optimizations.
If you’re eager to try Trellis 2, start with the official repo, use the sample images from the repository, and test on an RTX 4090 or A100. Keep an eye on the GitHub issues for the latest bug fixes, and join the community Discord or Reddit thread to share your experiences.
Happy modeling!