Which GPU Should You Rent for Fine-Tuning?
If you are fine-tuning and you jump straight to the biggest GPU, you are probably wasting money. The right GPU depends on three things: model size, fine-tuni
Fine-Tuning | 12 min read | 2026-03-22
If you are fine-tuning and you jump straight to the biggest GPU, you are probably wasting money. The right GPU depends on three things: model size, fine-tuning method, and dataset size. Get these right and you can fine-tune a 70B model for ₹1,000 instead of ₹5,000. Here is the complete guide.
The Fast Answer
- RTX 4090 (₹73/hr) is enough for LoRA and QLoRA on 7B-13B models. Handles 70% of fine-tuning jobs.
- A100 80GB (₹173/hr) is the move when VRAM becomes the limiting factor — 30B-70B models, full fine-tuning, large datasets.
- H100 (₹583/hr) only makes sense when you already know your fine-tune is large enough to need it — 100B+ models, multi-GPU distributed training.
Fine-Tuning Methods and Their VRAM Requirements
The fine-tuning method you choose determines how much VRAM you need. This is the single most important factor in GPU selection.
| Method | 7B Model | 13B Model | 30B Model | 70B Model |
|---|---|---|---|---|
| Full fine-tuning (FP16) | ~28GB | ~52GB | ~120GB | ~280GB |
| LoRA (FP16) | ~16GB | ~22GB | ~40GB | ~80GB |
| QLoRA (4-bit) | ~8GB | ~12GB | ~20GB | ~35GB |
These numbers include model weights, optimizer states, gradients, and activations for a batch size of 4 with 512-token context. Your actual VRAM usage will vary based on batch size, context length, and gradient accumulation steps.
For Most Teams: Start with RTX 4090
RTX 4090 (24GB) at ₹73/hr handles the majority of fine-tuning workloads:
- LoRA on 7B models: Uses ~16GB VRAM. Fits comfortably with room for batch size tuning. Typical job: 2 hours (₹146).
- LoRA on 13B models: Uses ~22GB VRAM. Fits with minimal headroom. Reduce batch size if needed. Typical job: 3 hours (₹219).
- QLoRA on 30B models: Uses ~20GB VRAM. Fits with good headroom. Typical job: 5 hours (₹365).
- QLoRA on 70B models: Uses ~35GB VRAM. Does NOT fit on 4090. Need A100 80GB minimum.
Bottom line: If you are fine-tuning 7B-13B models with LoRA, or 30B models with QLoRA, the RTX 4090 is the most cost-effective choice. It is usually the best starting point for smaller model fine-tunes, quick experiments, LoRA workflows, and budget-conscious iteration.
When You Should Move to A100
A100 80GB at ₹173/hr is needed when:
- Full fine-tuning 7B-13B models: Needs 28-52GB VRAM. The 4090 cannot fit full fine-tuning for 13B models. The A100 handles it comfortably.
- LoRA on 70B models: Needs ~80GB VRAM. This is the A100's sweet spot — it fits exactly with room for batch size and gradient accumulation.
- QLoRA on 70B models: Needs ~35GB VRAM. Fits on A100 with room for larger batch sizes and faster training. The 4090 cannot fit this.
- Large datasets (100K+ examples): The A100's higher memory bandwidth (2.0 TB/s vs 1.0 TB/s) speeds up data loading significantly for large datasets.
- Production fine-tuning pipelines: If you run fine-tuning jobs daily or weekly, the A100's ECC memory and data center reliability reduce the risk of mid-job failures.
Real example: Fine-tuning Llama 3.1 70B with LoRA on 50K examples takes about 6 hours on A100 80GB (₹1,038). The RTX 4090 cannot run this workload — it hits OOM at step 1. The H100 would do it in 3 hours (₹1,749) but costs 68% more for a 50% speedup.
When You Should Move to H100
H100 80GB at ₹583/hr is only needed when:
- Fine-tuning 100B+ models: The H100 is the only single GPU that can handle 100B+ model fine-tuning with reasonable batch sizes. Even then, you likely need multi-GPU setups.
- Multi-GPU distributed fine-tuning: H100's NVLink (900 GB/s) and transformer engine make it the best choice for distributed training across 4-8 GPUs.
- Time-critical fine-tuning: If you need to fine-tune a 70B model in under 2 hours (vs 6 hours on A100), the H100's 3x speedup justifies the 3.4x cost increase — but only if time is the bottleneck.
Honest assessment: 95% of fine-tuning jobs do not need an H100. The H100 is not your default fine-tuning GPU. It is what you rent after the smaller and cheaper options stop making sense.
Fine-Tuning Cost Calculator
Here is what different fine-tuning jobs cost on each GPU. These are real measurements from production runs.
| Fine-tuning situation | GPU | Time | Total cost |
|---|---|---|---|
| 7B LoRA (10K examples) | RTX 4090 | 2 hours | ₹146 |
| 7B LoRA (10K examples) | A100 80GB | 1.5 hours | ₹260 |
| 13B LoRA (10K examples) | RTX 4090 | 3 hours | ₹219 |
| 13B LoRA (10K examples) | A100 80GB | 2 hours | ₹346 |
| 30B QLoRA (10K examples) | RTX 4090 | 5 hours | ₹365 |
| 30B QLoRA (10K examples) | A100 80GB | 3 hours | ₹519 |
| 70B LoRA (50K examples) | RTX 4090 | Cannot run | N/A |
| 70B LoRA (50K examples) | A100 80GB | 6 hours | ₹1,038 |
| 70B LoRA (50K examples) | H100 80GB | 3 hours | ₹1,749 |
The Mistake That Burns Budget
A lot of people optimize for the GPU they have heard of, not the GPU the workload actually needs. Fine-tuning is especially easy to overspend on because LoRA and QLoRA let smaller GPUs do far more than people expect.
The most common pattern we see: a team rents an H100 for a 7B LoRA fine-tune that would have run perfectly on an RTX 4090. They pay ₹583/hr instead of ₹73/hr — 8x more — for the same result. The H100 finishes in 45 minutes instead of 2 hours, but the total cost is ₹437 vs ₹146. The 4090 is 67% cheaper for the same fine-tuned model.
Fine-Tuning Checklist Before You Rent
- Calculate VRAM needed: Model weights + optimizer states + gradients + activations + batch size + context length. Use the table above as a starting point.
- Choose fine-tuning method: LoRA for 7B-30B, QLoRA for 30B-70B, full fine-tuning only if you have the VRAM for it.
- Pick the smallest GPU that fits: If your VRAM need is under 24GB, start with RTX 4090. If it is 24-80GB, start with A100. If it is over 80GB, you need multi-GPU.
- Test with a small dataset first: Run 100 examples before committing to 10K. Verify your VRAM estimate is correct.
- Measure utilization: If GPU utilization is below 50%, you are overprovisioned. Downsize next time.
- Set auto-terminate: Add
sudo shutdown -h nowat the end of your training script. Prevents idle billing.
The Simple Rule
Start with the cheapest GPU that can hold the job. Upgrade only when the workload forces you to, not because the bigger card feels safer. For fine-tuning, that means:
- 7B-13B models: RTX 4090 (₹73/hr) — always start here
- 30B models: RTX 4090 with QLoRA, or A100 with LoRA
- 70B models: A100 80GB (₹173/hr) — this is the sweet spot
- 100B+ models: H100 or multi-GPU A100 setup
Pick the right GPU before you pay for it
Compare live 4090, A100, and H100 options based on what your fine-tune actually needs. Start small, measure, and scale up only when forced to.
Browse GPUs