AI Model
Llama 3 70B
70B parameters · text-generation
VRAM (FP16)
140 GB
VRAM (INT4)
40 GB
Family
llama
Compatible GPUs
NVIDIA H200
Min GPUs: 1 · fp16
NVIDIA H200 NVL
Min GPUs: 1 · fp16
NVIDIA B200
Min GPUs: 1 · fp16
NVIDIA GB200
Min GPUs: 1 · fp16
AMD Instinct MI300X
Min GPUs: 1 · fp16
NVIDIA B300
Min GPUs: 1 · fp16
NVIDIA H100
Min GPUs: 2 · fp16
NVIDIA A100
Min GPUs: 2 · fp16
NVIDIA GH200
Min GPUs: 2 · fp16
AMD Instinct MI250X
Min GPUs: 2 · fp16
NVIDIA L40S
Min GPUs: 3 · fp16
NVIDIA A40
Min GPUs: 3 · fp16
NVIDIA RTX 6000 Ada
Min GPUs: 3 · fp16
NVIDIA RTX A6000
Min GPUs: 3 · fp16
NVIDIA A16
Min GPUs: 3 · fp16
NVIDIA V100
Min GPUs: 5 · fp16
NVIDIA RTX 5090
Min GPUs: 5 · fp16
NVIDIA L4
Min GPUs: 6 · fp16
NVIDIA A10
Min GPUs: 6 · fp16
NVIDIA A10G
Min GPUs: 6 · fp16
NVIDIA RTX 4090
Min GPUs: 6 · fp16
NVIDIA RTX A5000
Min GPUs: 6 · fp16
NVIDIA RTX 3090
Min GPUs: 6 · fp16
NVIDIA T4
Min GPUs: 9 · fp16
NVIDIA P100
Min GPUs: 9 · fp16
NVIDIA RTX 5080
Min GPUs: 9 · fp16
NVIDIA RTX A4000
Min GPUs: 9 · fp16
Supported Frameworks
vLLMText Generation InferencePyTorchTensorRT-LLMOllamaSGLangllama.cpp
Deploy Llama 3 70B
Get a full deployment stack recommendation — GPU, count, framework, quantization, and projected cost.
Start deploymentVRAM Usage
FP16 serving needs about 140 GB before workload-specific headroom. INT4 quantization reduces the model weights to about 40 GB, which is the practical path for large models on smaller GPU clusters.
Related Llama 3 70B resources
Move from model requirements into compatible GPU prices, deployment, and the wider model catalog.
NVIDIA H200 prices for Llama 3 70BRecommended GPU path for this model at fp16 precision.NVIDIA H200 NVL cloud pricesCompatible option for Llama 3 70B; minimum 1 GPU.NVIDIA B200 cloud pricesCompatible option for Llama 3 70B; minimum 1 GPU.NVIDIA GB200 cloud pricesCompatible option for Llama 3 70B; minimum 1 GPU.Deploy Llama 3 70BGenerate a deployment recommendation with GPU count, framework, and estimated cost.Model VRAM leaderboardCompare FP16 and INT4 memory requirements across other deployable models.