TRELLIS.2-4B: Complete Guide to Microsoft's Revolutionary Image-to-3D Generation Model (2025)

CurateClick Teamβ€’
Share

title: "TRELLIS.2-4B: Complete Guide to Microsoft's Revolutionary Image-to-3D Generation Model (2025)" description: "Complete guide to TRELLIS.2-4B: Microsoft's 4B parameter model for image-to-3D generation. Features O-Voxel technology, 3-60 second generation, full PBR materials, arbitrary topology support, and MIT open-source license. Includes installation, usage examples, benchmarks, and training guide." publishedAt: "2025-12-18" updatedAt: "2025-12-18" author: "CurateClick Team" tags: ["TRELLIS.2-4B", "Microsoft Research", "3D Generation", "Image-to-3D", "O-Voxel", "3D AI", "PBR Materials", "3D Modeling", "Computer Vision", "3D Assets", "Open Source", "MIT License"] featured: true seo: title: "TRELLIS.2-4B: Complete Guide to Microsoft's Revolutionary Image-to-3D Generation Model (2025)" description: "Complete guide to TRELLIS.2-4B: Microsoft's 4B parameter model for image-to-3D generation. Features O-Voxel technology, 3-60 second generation, full PBR materials, arbitrary topology support, and MIT open-source license. Includes installation, usage examples, benchmarks, and training guide." keywords: ["TRELLIS.2-4B", "Microsoft Research", "3D Generation", "Image-to-3D", "O-Voxel", "3D AI", "PBR Materials", "3D Modeling", "Computer Vision", "3D Assets", "Open Source", "MIT License", "3D Mesh Generation", "3D Asset Creation", "Neural 3D", "3D Deep Learning", "3D Reconstruction", "3D Content Generation"] canonical: "/blog/2025-trellis-2-4b"

TRELLIS.2-4B: The Complete Guide to Microsoft's Revolutionary 3D Generation Model (2025)

🎯 Core Highlights (TL;DR)

  • TRELLIS.2-4B is Microsoft's state-of-the-art 4 billion parameter model for high-fidelity image-to-3D generation
  • Introduces O-Voxel (Omni-Voxel), a breakthrough "field-free" representation handling arbitrary topologies including open surfaces and non-manifold geometry
  • Achieves ultra-fast generation: 3 seconds for 512Β³ resolution, 17 seconds for 1024Β³ on NVIDIA H100
  • Supports full PBR materials (Base Color, Metallic, Roughness, Opacity) for photorealistic rendering
  • Compresses 1024Β³ assets into only ~9.6K latent tokens with negligible quality loss
  • Open-source under MIT License with complete training code and 500K dataset

Table of Contents

  1. What is TRELLIS.2-4B?
  2. Evolution from TRELLIS to TRELLIS.2
  3. Technical Innovations
  4. Key Features and Capabilities
  5. Performance Benchmarks
  6. Installation and Setup
  7. How to Use TRELLIS.2-4B
  8. Comparison with Other 3D Generation Models
  9. Training Your Own Model
  10. Limitations and Considerations
  11. Frequently Asked Questions
  12. Conclusion and Next Steps

What is TRELLIS.2-4B?

TRELLIS.2-4B is Microsoft Research's latest breakthrough in 3D generative AI, representing a significant leap forward in image-to-3D conversion technology. As a 4 billion parameter model, it transforms single 2D images into fully textured, high-resolution 3D assets with unprecedented quality and speed.

Core Capabilities

  • Input: Single RGB image
  • Output: Fully textured 3D mesh with PBR materials
  • Resolution: Supports 512Β³ to 1536Β³ voxel grid resolution
  • Speed: 3-60 seconds depending on resolution (NVIDIA H100)
  • License: MIT License (open-source)

πŸ’‘ Key Innovation Unlike traditional methods that rely on implicit fields (SDF, NeRF) or iso-surface representations (Flexicubes), TRELLIS.2 uses a novel "field-free" approach that natively handles complex geometries without lossy conversions.

Research Background

Developed by a collaborative team from Tsinghua University and Microsoft Research, TRELLIS.2 builds upon the original TRELLIS model (CVPR'25 Spotlight) with fundamental architectural improvements. The research paper is available at arXiv:2512.14692.


Evolution from TRELLIS to TRELLIS.2

TRELLIS (First Generation)

The original TRELLIS introduced the concept of Structured LATent (SLAT) representation, enabling:

  • Multiple output formats (Radiance Fields, 3D Gaussians, Meshes)
  • Models up to 2B parameters
  • Training on 500K diverse 3D objects
Model Parameters Key Feature
TRELLIS-image-large 1.2B Image-to-3D generation
TRELLIS-text-base 342M Text-to-3D (base)
TRELLIS-text-large 1.1B Text-to-3D (large)
TRELLIS-text-xlarge 2.0B Text-to-3D (extra-large)

TRELLIS.2 Breakthrough

TRELLIS.2 represents a paradigm shift with:

βœ… Native topology handling - No conversion artifacts
βœ… Compact latent space - 16Γ— spatial compression
βœ… Instant processing - Rendering-free, optimization-free
βœ… Full PBR support - Including transparency/translucency
βœ… Higher resolution - Up to 1536Β³ voxel grids


Technical Innovations

1. O-Voxel: Omni-Voxel Representation

O-Voxel is the cornerstone innovation of TRELLIS.2, representing a "field-free" sparse voxel structure that simultaneously encodes geometry and appearance.

Geometry Component (f_shape)

  • Flexible Dual Grids: Handles arbitrary topologies
  • Sharp Edge Preservation: Maintains geometric details
  • Topology Freedom: Supports open surfaces, non-manifold geometry, internal structures

Appearance Component (f_mat)

  • Base Color: RGB texture information
  • Metallic: Material reflectivity
  • Roughness: Surface smoothness
  • Alpha: Transparency/translucency support

⚠️ Technical Advantage Traditional iso-surface methods (SDF, Flexicubes) struggle with:

  • Open surfaces (e.g., cloth, hair)
  • Non-manifold geometry (e.g., intersecting surfaces)
  • Internal structures (e.g., hollow objects)

O-Voxel handles all these cases natively without conversion artifacts.

2. SC-VAE: Sparse Compression VAE

The Sparse Compression 3D VAE employs a Sparse Residual Autoencoding scheme to achieve unprecedented compression ratios.

Resolution Latent Tokens Compression Ratio
512Β³ ~2.4K 64Γ— spatial
1024Β³ ~9.6K 16Γ— spatial
1536Β³ ~21.6K 7Γ— spatial

Key Features:

  • Negligible perceptual degradation
  • Efficient large-scale generative modeling
  • Direct voxel compression without intermediate representations

3. Flow-Matching Transformer Architecture

TRELLIS.2-4B utilizes vanilla DiT (Diffusion Transformer) architecture with:

  • 4 billion parameters
  • Flow-matching training objective
  • Efficient attention mechanisms for sparse data
  • Multi-resolution training strategy

4. Instant Bidirectional Conversion

One of TRELLIS.2's most practical innovations is the ability to convert between meshes and O-Voxels instantly:

Direction Time (Single CPU) Time (CUDA)
Mesh β†’ O-Voxel < 10 seconds < 100ms
O-Voxel β†’ Mesh < 10 seconds < 100ms

This enables:

  • Rendering-free processing: No need for multi-view rendering
  • Optimization-free workflow: Direct conversion without iterative refinement
  • Minimalist pipeline: Simplified data preparation and post-processing

Key Features and Capabilities

High Quality and Resolution

TRELLIS.2-4B generates assets with exceptional fidelity across multiple resolutions:

πŸ“Š Generation Quality Metrics

Resolution: 512Β³
- Generation Time: 3 seconds (2s shape + 1s material)
- Detail Level: High
- Use Case: Rapid prototyping, real-time applications

Resolution: 1024Β³
- Generation Time: 17 seconds (10s shape + 7s material)
- Detail Level: Very High
- Use Case: Production assets, game development

Resolution: 1536Β³
- Generation Time: 60 seconds (35s shape + 25s material)
- Detail Level: Ultra High
- Use Case: Film production, high-end visualization

Arbitrary Topology Handling

Unlike traditional methods constrained by iso-surface representations, TRELLIS.2 robustly handles:

βœ” Open Surfaces

  • Cloth, curtains, flags
  • Hair and fur
  • Thin structures

βœ” Non-manifold Geometry

  • Intersecting surfaces
  • Self-intersections
  • Complex architectural elements

βœ” Internal Structures

  • Hollow objects
  • Multi-layer constructions
  • Enclosed cavities

Rich Texture Modeling with PBR

Full Physically Based Rendering (PBR) support enables photorealistic relighting:

Material Property Description Use Case
Base Color RGB albedo texture Surface appearance
Metallic Metal vs. dielectric Material type classification
Roughness Surface smoothness Specular reflection control
Opacity (Alpha) Transparency level Glass, water, translucent materials

βœ… Best Practice The PBR material system is compatible with standard game engines (Unity, Unreal Engine) and 3D software (Blender, Maya), enabling seamless integration into production pipelines.

Shape-Conditioned Texture Generation

TRELLIS.2 supports two generation modes:

  1. Image-to-3D: Generate complete 3D asset from single image
  2. Texture Generation: Generate textures for existing 3D meshes with reference image

This flexibility allows:

  • Re-texturing existing assets
  • Style transfer to 3D models
  • Texture variation generation

Performance Benchmarks

Generation Speed (NVIDIA H100)

Resolution Shape Generation Material Generation Total Time
512Β³ 2 seconds 1 second 3 seconds
1024Β³ 10 seconds 7 seconds 17 seconds
1536Β³ 35 seconds 25 seconds 60 seconds

Latent Space Efficiency

TRELLIS.2 achieves state-of-the-art compression while maintaining quality:

Reconstruction Accuracy vs. Latent Compactness

TRELLIS.2: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ (Highest)
Method A:  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
Method B:  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
Method C:  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘

Compactness (tokens per 1024Β³):
TRELLIS.2: ~9.6K
Method A:  ~25K
Method B:  ~40K
Method C:  ~60K

Hardware Requirements

Component Minimum Recommended
GPU Memory 24GB 48GB+
GPU Model NVIDIA A100 NVIDIA H100
System RAM 32GB 64GB+
CUDA Version 12.4+ 12.4+
OS Linux Linux (Ubuntu 20.04+)

Installation and Setup

Prerequisites

Before installing TRELLIS.2, ensure your system meets these requirements:

  • Operating System: Linux (tested on Ubuntu 20.04+)
  • GPU: NVIDIA GPU with 24GB+ VRAM
  • CUDA Toolkit: Version 12.4 or higher
  • Python: Version 3.8 or higher
  • Conda: For dependency management

⚠️ Windows Users While primarily tested on Linux, Windows setup is possible but not officially supported. Refer to community discussions for Windows-specific configurations.

Step-by-Step Installation

1. Clone the Repository

git clone --recurse-submodules https://github.com/microsoft/TRELLIS.2.git
cd TRELLIS.2

2. Create Conda Environment

# Create new environment
conda create -n trellis2 python=3.10
conda activate trellis2

3. Install Dependencies

The installation script provides modular dependency installation:

# Install all dependencies for inference
. ./setup.sh --new-env --basic --xformers --flash-attn \
  --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast

Installation Flags Explained:

Flag Purpose
--new-env Create new conda environment named 'trellis2'
--basic Install core dependencies
--xformers Memory-efficient attention (for GPUs without flash-attn)
--flash-attn Fast attention implementation (recommended)
--diffoctreerast Differentiable octree rasterizer
--spconv Sparse convolution operations
--mipgaussian Mip-splatting for Gaussian rendering
--kaolin NVIDIA's 3D deep learning library
--nvdiffrast Differentiable rasterizer

4. Environment Configuration

Set environment variables for optimal performance:

export OPENCV_IO_ENABLE_OPENEXR=1
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

# For GPUs without flash-attn support (e.g., V100)
# export ATTN_BACKEND=xformers

# SPCONV algorithm selection
export SPCONV_ALGO=native  # Use 'auto' for benchmarking (slower first run)

5. Download Pre-trained Models

Models are automatically downloaded from Hugging Face on first use, or download manually:

# Models will be cached in ~/.cache/huggingface/
# No manual download required for basic usage

Troubleshooting Installation

Issue: CUDA version mismatch

# Check CUDA version
nvcc --version

# Set correct CUDA path
export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH

Issue: Out of memory during compilation

# Limit parallel compilation jobs
export MAX_JOBS=4

How to Use TRELLIS.2-4B

Basic Image-to-3D Generation

Here's a minimal example to generate a 3D asset from an image:

import os
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import cv2
import imageio
from PIL import Image
import torch
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from trellis2.utils import render_utils
from trellis2.renderers import EnvMap
import o_voxel

# 1. Setup Environment Map for PBR rendering
envmap = EnvMap(torch.tensor(
    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), 
                 cv2.COLOR_BGR2RGB),
    dtype=torch.float32, device='cuda'
))

# 2. Load Pipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# 3. Load Image & Run Generation
image = Image.open("assets/example_image/T.png")
mesh = pipeline.run(image)[0]

# 4. Simplify mesh (nvdiffrast has 16M triangle limit)
mesh.simplify(16777216)

# 5. Render Video Preview
video = render_utils.make_pbr_vis_frames(
    render_utils.render_video(mesh, envmap=envmap)
)
imageio.mimsave("output.mp4", video, fps=15)

# 6. Export to GLB format
glb = o_voxel.postprocess.to_glb(
    vertices            = mesh.vertices,
    faces               = mesh.faces,
    attr_volume         = mesh.attrs,
    coords              = mesh.coords,
    attr_layout         = mesh.layout,
    voxel_size          = mesh.voxel_size,
    aabb                = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
    decimation_target   = 1000000,
    texture_size        = 4096,
    remesh              = True,
    remesh_band         = 1,
    remesh_project      = 0,
    verbose             = True
)
glb.export("output.glb", extension_webp=True)

Advanced Usage: Multi-Resolution Generation

Generate assets at different resolutions based on your needs:

# High-speed generation (512Β³)
mesh_fast = pipeline.run(
    image,
    resolution=512,
    seed=42
)[0]

# Balanced quality (1024Β³) - Default
mesh_balanced = pipeline.run(
    image,
    resolution=1024,
    seed=42
)[0]

# Maximum quality (1536Β³)
mesh_ultra = pipeline.run(
    image,
    resolution=1536,
    seed=42
)[0]

Shape-Conditioned Texture Generation

Generate textures for existing 3D meshes:

from trellis2.pipelines import Trellis2TextureGenerationPipeline

# Load texture generation pipeline
texture_pipeline = Trellis2TextureGenerationPipeline.from_pretrained(
    "microsoft/TRELLIS.2-4B"
)
texture_pipeline.cuda()

# Load existing mesh and reference image
input_mesh = o_voxel.io.load_mesh("input_model.obj")
reference_image = Image.open("texture_reference.png")

# Generate texture
textured_mesh = texture_pipeline.run(
    mesh=input_mesh,
    image=reference_image,
    seed=42
)[0]

Batch Processing

Process multiple images efficiently:

import glob
from pathlib import Path

# Load pipeline once
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# Process all images in directory
image_paths = glob.glob("input_images/*.png")

for img_path in image_paths:
    image = Image.open(img_path)
    mesh = pipeline.run(image)[0]
    
    # Save with same filename
    output_name = Path(img_path).stem
    glb = o_voxel.postprocess.to_glb(
        vertices=mesh.vertices,
        faces=mesh.faces,
        attr_volume=mesh.attrs,
        coords=mesh.coords,
        attr_layout=mesh.layout,
        voxel_size=mesh.voxel_size,
        aabb=[[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
        decimation_target=1000000,
        texture_size=4096
    )
    glb.export(f"output/{output_name}.glb", extension_webp=True)

Output Formats

TRELLIS.2 supports multiple output formats:

Format Extension Use Case
GLB .glb Web, game engines, general 3D software
PLY .ply Point cloud, Gaussian splatting
OBJ .obj Traditional 3D modeling software
GLTF .gltf Web applications, AR/VR
# Export to different formats
mesh.save_ply("output.ply")  # Gaussian representation
mesh.save_obj("output.obj")  # Traditional mesh
glb.export("output.glb")     # Optimized for web/games

Comparison with Other 3D Generation Models

TRELLIS.2 vs. Original TRELLIS

Feature TRELLIS (v1) TRELLIS.2
Representation SLAT (Structured Latent) O-Voxel (Omni-Voxel)
Topology Support Limited (iso-surface based) Arbitrary (field-free)
Max Resolution 1024Β³ 1536Β³
Latent Tokens (1024Β³) ~25K ~9.6K
PBR Materials Partial Full (including alpha)
Processing Pipeline Multi-stage rendering Instant conversion
Open Surfaces ❌ βœ…
Non-manifold Geometry ❌ βœ…
Internal Structures ❌ βœ…

TRELLIS.2 vs. Other State-of-the-Art Models

Model Parameters Speed (1024Β³) Topology PBR Support
TRELLIS.2-4B 4B 17s Arbitrary Full
Shap-E 300M ~30s Limited Partial
Point-E 1B ~45s Limited No
DreamFusion - ~2 hours Limited Partial
Magic3D - ~40 min Limited Partial
Instant3D 2B ~25s Limited Partial

βœ… Competitive Advantage TRELLIS.2's combination of speed, quality, and topology flexibility makes it the most versatile solution for production-ready 3D asset generation.

When to Use TRELLIS.2

Best Use Cases:

  • Production asset creation for games and films
  • Rapid prototyping and concept visualization
  • E-commerce 3D product visualization
  • AR/VR content creation
  • Architectural visualization
  • Digital twin creation

Consider Alternatives When:

  • You need text-only input (use TRELLIS-text models)
  • You require real-time generation on mobile devices
  • You need extremely high polygon counts (>10M triangles)
  • You're working with specific artistic styles (may need fine-tuning)

Training Your Own Model

TRELLIS.2 provides complete training code for researchers and developers who want to:

  • Fine-tune on custom datasets
  • Experiment with architecture modifications
  • Train domain-specific models

Training Dataset: TRELLIS-500K

Microsoft provides TRELLIS-500K, a curated dataset containing 500,000 high-quality 3D assets from:

Source Assets Description
Objaverse(XL) ~350K Diverse everyday objects
ABO ~50K Amazon product catalog
3D-FUTURE ~40K Furniture and interior design
HSSD ~40K Habitat synthetic scenes
Toys4k ~20K Toy objects

All assets are filtered based on aesthetic scores and quality metrics.

Training Pipeline Overview

The training process follows a multi-stage approach:

Stage 1: VAE Training
β”œβ”€β”€ Sparse Structure VAE (ss_vae)
└── SLat VAE with Decoders (slat_vae)
    β”œβ”€β”€ Gaussian Decoder
    β”œβ”€β”€ Radiance Field Decoder
    └── Mesh Decoder

Stage 2: Flow Model Training
β”œβ”€β”€ Sparse Structure Flow (ss_flow)
└── SLat Flow (slat_flow)

Training Configuration

Example configurations are provided in the configs/ directory:

VAE Training:

# Train Sparse Structure VAE
python train.py \
  --config configs/vae/ss_vae_conv3d_16l8_fp16.json \
  --output_dir outputs/ss_vae \
  --data_dir /path/to/TRELLIS-500K \
  --num_gpus 8

# Train SLat VAE with Gaussian Decoder
python train.py \
  --config configs/vae/slat_vae_enc_dec_gs_swin8_B_64l8_fp16.json \
  --output_dir outputs/slat_vae_gs \
  --data_dir /path/to/TRELLIS-500K \
  --num_gpus 8

Flow Model Training:

# Train Image-conditioned Flow Model
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/slat_flow_img \
  --data_dir /path/to/TRELLIS-500K \
  --num_nodes 4 \
  --num_gpus 8 \
  --master_addr $MASTER_ADDR \
  --master_port $MASTER_PORT

Multi-Node Distributed Training

For large-scale training across multiple machines:

# Node 0 (Master)
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/slat_flow_img_distributed \
  --data_dir /path/to/TRELLIS-500K \
  --num_nodes 4 \
  --node_rank 0 \
  --num_gpus 8 \
  --master_addr 192.168.1.100 \
  --master_port 29500

# Node 1
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/slat_flow_img_distributed \
  --data_dir /path/to/TRELLIS-500K \
  --num_nodes 4 \
  --node_rank 1 \
  --num_gpus 8 \
  --master_addr 192.168.1.100 \
  --master_port 29500

# Repeat for nodes 2 and 3...

Fine-tuning on Custom Data

To fine-tune TRELLIS.2 on your own dataset:

  1. Prepare Data: Convert your 3D assets to O-Voxel format
  2. Configure Training: Modify config files for your dataset
  3. Resume from Checkpoint: Load pre-trained weights
python train.py \
  --config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
  --output_dir outputs/custom_finetuned \
  --data_dir /path/to/custom/dataset \
  --load_dir microsoft/TRELLIS.2-4B \
  --num_gpus 8

Training Hardware Requirements

Model Size Recommended GPUs Training Time (500K dataset)
Base (342M) 8Γ— A100 (40GB) ~1 week
Large (1.1B) 16Γ— A100 (40GB) ~2 weeks
XLarge (4B) 32Γ— A100 (80GB) ~4 weeks

Limitations and Considerations

Known Limitations

1. Geometric Artifacts

Issue: Generated meshes may occasionally contain small holes or minor topological discontinuities.

Impact:

  • Affects applications requiring watertight geometry (3D printing, simulation)
  • More common in high-complexity models with intricate details

Mitigation:

# Use provided post-processing scripts
from trellis2.utils import mesh_repair

cleaned_mesh = mesh_repair.fill_holes(mesh, max_hole_size=100)
cleaned_mesh = mesh_repair.remove_degenerate_faces(cleaned_mesh)

2. Base Model Without Alignment

Issue: TRELLIS.2-4B is a pre-trained foundation model without human preference alignment (RLHF).

Impact:

  • Output style reflects training data distribution
  • May require multiple generations to achieve desired aesthetic
  • Not optimized for specific artistic styles

Recommendations:

  • Generate multiple variants with different seeds
  • Use post-processing for style refinement
  • Consider fine-tuning for specific aesthetic requirements

3. Input Image Quality Dependency

Issue: Output quality heavily depends on input image characteristics.

Best Practices:

  • Use high-resolution images (512Γ—512 minimum, 1024Γ—1024 recommended)
  • Ensure clear object visibility with minimal occlusion
  • Prefer images with good lighting and contrast
  • Avoid heavily compressed or noisy images

4. Memory Requirements

Issue: High-resolution generation requires substantial GPU memory.

Resolution Minimum VRAM Recommended VRAM
512Β³ 16GB 24GB
1024Β³ 24GB 40GB
1536Β³ 40GB 80GB

Memory Optimization:

# Enable memory-efficient settings
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Use gradient checkpointing during inference
pipeline.enable_memory_efficient_attention()

Responsible AI Considerations

⚠️ Important Notice TRELLIS.2 is a research project. Responsible AI considerations were factored into all stages:

  • Dataset Curation: Public datasets reviewed for harmful content and PII
  • Potential Bias: Internet-sourced data may contain inherent biases
  • Intended Use: Academic and research purposes only
  • Commercial Use: Requires careful evaluation of generated content

Ethical Guidelines:

  • Do not generate content that infringes intellectual property rights
  • Avoid creating misleading or deceptive 3D representations
  • Respect privacy and consent when generating assets based on real objects
  • Consider cultural sensitivity in generated content

Performance Considerations

Factors Affecting Generation Quality:

  1. Input Image Characteristics

    • Resolution and clarity
    • Lighting conditions
    • Object visibility and occlusion
    • Background complexity
  2. Generation Parameters

    • Resolution setting (512Β³ vs 1024Β³ vs 1536Β³)
    • Random seed selection
    • Sampling steps (if configurable)
  3. Hardware Configuration

    • GPU model and memory
    • CUDA version compatibility
    • Driver version

Frequently Asked Questions

Q: What is the difference between TRELLIS and TRELLIS.2?

A: TRELLIS.2 represents a fundamental architectural upgrade from the original TRELLIS. The key differences are:

  • Representation: TRELLIS uses SLAT (Structured Latent), while TRELLIS.2 uses O-Voxel (Omni-Voxel), a "field-free" approach
  • Topology: TRELLIS.2 natively handles arbitrary topologies including open surfaces and non-manifold geometry, which TRELLIS cannot
  • Efficiency: TRELLIS.2 compresses 1024Β³ assets into ~9.6K tokens vs. ~25K in TRELLIS
  • Materials: TRELLIS.2 supports full PBR including transparency, while TRELLIS has partial support
  • Processing: TRELLIS.2 offers instant mesh conversion (<100ms with CUDA) vs. multi-stage rendering in TRELLIS

Q: Can I use TRELLIS.2 for commercial projects?

A: Yes, TRELLIS.2 is released under the MIT License, which permits commercial use. However:

  • Verify that generated assets don't infringe on existing intellectual property
  • The model is a base model without alignment, so output quality may vary
  • Some submodules may have different licenses (check the LICENSE file)
  • Consider the ethical implications of AI-generated content in your use case

Q: What GPU do I need to run TRELLIS.2?

A: Minimum requirements:

  • GPU: NVIDIA GPU with 24GB VRAM (e.g., RTX 3090, A5000, A100)
  • Resolution: 512Β³ requires 16GB, 1024Β³ requires 24GB, 1536Β³ requires 40GB+
  • Tested on: NVIDIA A100 and H100 GPUs
  • Not supported: AMD GPUs, Apple Silicon (MPS), CPU-only inference

For optimal performance, use NVIDIA H100 or A100 (80GB) GPUs.

Q: How do I improve generation quality?

A: Follow these best practices:

  1. Input Image Quality:

    • Use high-resolution images (1024Γ—1024 recommended)
    • Ensure good lighting and clear object visibility
    • Remove complex backgrounds if possible
    • Avoid heavily compressed or noisy images
  2. Generation Settings:

    • Use higher resolution (1024Β³ or 1536Β³) for detailed assets
    • Try different random seeds (generate 3-5 variants)
    • Experiment with different input angles if available
  3. Post-Processing:

    • Use mesh repair tools for geometric artifacts
    • Apply texture enhancement in 3D software
    • Optimize topology for your specific use case

Q: Can TRELLIS.2 generate 3D assets from text prompts?

A: TRELLIS.2-4B is specifically designed for image-to-3D generation. For text-to-3D, you have two options:

  1. Two-stage approach (Recommended):

    • Use a text-to-image model (DALL-E, Midjourney, Stable Diffusion)
    • Feed generated image to TRELLIS.2-4B
    • This typically produces better results
  2. Use TRELLIS text models:

    • TRELLIS-text-base (342M)
    • TRELLIS-text-large (1.1B)
    • TRELLIS-text-xlarge (2.0B)
    • Note: These are from TRELLIS v1, not TRELLIS.2

Q: How long does generation take?

A: Generation time depends on resolution and hardware:

On NVIDIA H100:

  • 512Β³: ~3 seconds (2s shape + 1s material)
  • 1024Β³: ~17 seconds (10s shape + 7s material)
  • 1536Β³: ~60 seconds (35s shape + 25s material)

On NVIDIA A100 (40GB):

  • 512Β³: ~5 seconds
  • 1024Β³: ~30 seconds
  • 1536Β³: ~120 seconds

Older GPUs (RTX 3090, A6000) will be proportionally slower.

Q: What output formats are supported?

A: TRELLIS.2 supports multiple industry-standard formats:

  • GLB/GLTF: Optimized for web, game engines (Unity, Unreal), and AR/VR
  • PLY: Point cloud format, useful for Gaussian splatting
  • OBJ: Traditional mesh format for 3D modeling software
  • Mesh with PBR: Full material properties (Base Color, Metallic, Roughness, Alpha)

All formats include full PBR material information where applicable.

Q: Can I train TRELLIS.2 on my own dataset?

A: Yes, the complete training code is provided. You can:

  1. Fine-tune the pre-trained model on your custom dataset
  2. Train from scratch if you have sufficient data (100K+ assets recommended)
  3. Modify architecture for research purposes

Requirements:

  • Convert your 3D assets to O-Voxel format using provided tools
  • Minimum 8Γ— NVIDIA A100 GPUs for fine-tuning
  • 32Γ— A100 GPUs for full training of 4B model
  • Training time: 1-4 weeks depending on model size

Q: Does TRELLIS.2 work on Windows?

A: TRELLIS.2 is primarily developed and tested on Linux (Ubuntu 20.04+). Windows support is:

  • Not officially supported by the development team
  • Possible with community workarounds (see GitHub issues)
  • Recommended approach: Use WSL2 (Windows Subsystem for Linux) with GPU passthrough

For production use, Linux is strongly recommended.

Q: How does TRELLIS.2 handle transparent or translucent objects?

A: TRELLIS.2 has native support for transparency through the Alpha channel in its PBR material system:

  • Opacity/Alpha attribute is part of the O-Voxel representation
  • Supports both binary transparency (glass) and gradient translucency (smoke, water)
  • Exports correctly to GLB format with alpha channel preserved
  • Compatible with standard rendering engines that support PBR

This is a significant advantage over methods that only support opaque surfaces.

Q: What is the TRELLIS-500K dataset?

A: TRELLIS-500K is the training dataset for TRELLIS.2, containing:

  • 500,000 curated 3D assets from multiple sources
  • Filtered based on aesthetic scores and quality metrics
  • Includes diverse categories: objects, furniture, toys, architectural elements
  • Publicly available for research purposes
  • Comes with data preparation toolkits for processing custom assets

Sources: Objaverse(XL), ABO, 3D-FUTURE, HSSD, Toys4k


Conclusion and Next Steps

Summary

TRELLIS.2-4B represents a significant breakthrough in 3D generative AI, offering:

βœ… Unmatched Versatility: Handles arbitrary topologies including open surfaces, non-manifold geometry, and internal structures
βœ… Exceptional Efficiency: 3-60 second generation time with compact 9.6K token representation
βœ… Production-Ready Quality: Full PBR materials with photorealistic rendering capabilities
βœ… Open Research: MIT License with complete training code and 500K dataset
βœ… Minimalist Pipeline: Instant, optimization-free mesh conversion

Getting Started Checklist

  • Verify hardware requirements (24GB+ NVIDIA GPU)
  • Install CUDA Toolkit 12.4+
  • Clone repository and install dependencies
  • Download or prepare test images
  • Run basic image-to-3D generation example
  • Experiment with different resolutions and settings
  • Export to GLB format for use in your pipeline

For Researchers:

  1. Explore the technical paper: arXiv:2512.14692
  2. Download TRELLIS-500K dataset for analysis
  3. Experiment with architecture modifications
  4. Benchmark against your own methods

For Developers:

  1. Integrate TRELLIS.2 into your 3D content pipeline
  2. Build applications using the API
  3. Optimize for your specific hardware configuration
  4. Contribute to the open-source project

For Artists and Designers:

  1. Test with various input images to understand capabilities
  2. Develop workflows combining text-to-image and TRELLIS.2
  3. Experiment with post-processing in 3D software
  4. Share results and feedback with the community
Resource Link
Official Repository github.com/microsoft/TRELLIS.2
Research Paper arxiv.org/abs/2512.14692
Project Page microsoft.github.io/TRELLIS.2
Model Hub huggingface.co/microsoft/TRELLIS.2-4B
Dataset TRELLIS-500K Documentation
Original TRELLIS github.com/microsoft/TRELLIS

Community and Support

  • GitHub Issues: Report bugs and request features
  • Discussions: Share results and ask questions
  • Research Collaboration: Contact the authors for academic partnerships
  • Commercial Inquiries: Review MIT License terms and conditions

Final Thoughts

TRELLIS.2-4B pushes the boundaries of what's possible in 3D generative AI, combining cutting-edge research with practical usability. Whether you're building the next generation of 3D content tools, conducting academic research, or creating immersive experiences, TRELLIS.2 provides a powerful foundation for innovation in 3D generation.

The open-source nature of the project, combined with comprehensive documentation and pre-trained models, makes it accessible to a wide range of usersβ€”from researchers exploring new architectures to developers building production applications.

Start generating high-quality 3D assets today with TRELLIS.2-4B!


Last Updated: December 2025
Model Version: TRELLIS.2-4B
License: MIT License

TRELLIS.2-4B Complete Guide