Run 35B LLMs on Dual Pascal GPUs with QLoRA

October 7, 2025

Share This Post

Hi HN,

  I built a system to run 35B parameter language models on older Pascal GPUs (P100 +
  GTX 1080 Ti) using multi-GPU memory spillover.

  Problem: Most LLM inference tools (Ollama, LM Studio) are limited to single GPU VRAM
  (~13B models max on a 16GB GPU). If you have multiple older GPUs, the second one sits
   idle.

  Solution: Multi-GPU + CPU memory spillover with QLoRA 4-bit quantization. The system
  automatically distributes layers across GPU0 → GPU1 → CPU RAM, enabling 35B models on
   hardware that normally maxes at 13B.

  Benchmarks (P100 16GB + GTX 1080 Ti 11GB):
  - Qwen-14B: 13.7 tokens/sec (9.4GB VRAM)
  - OPT-30B: 5.4 tokens/sec (15.2GB VRAM)
  - CodeLlama-34B: 0.8 tokens/sec (16.7GB VRAM)

  Quick start:
    docker pull rickeshtn/large-model-international_release:latest
    docker run -it --rm --runtime=nvidia --gpus all --ipc=host     --ulimit memlock=-1
  --ulimit stack=268435456     -v $(pwd):/workspace -e HF_HOME=/workspace/model_cache
     rickeshtn/large-model-international_release:latest     python
  /app/interactive_chat.py --model-name Qwen/Qwen2.5-14B-Instruct

  Technical details:
  - QLoRA 4-bit NF4 quantization (75% memory reduction)
  - HuggingFace Transformers + Accelerate + bitsandbytes
  - Automatic device mapping with CPU offload
  - Interactive chat with conversation persistence

  GitHub: https://github.com/rickeshtn/locallm-pascal
  Docker Hub: https://hub.docker.com/r/rickeshtn/large-model-international_release

  34 users already running it. Happy to answer technical questions!

Comments URL: https://news.ycombinator.com/item?id=45498552

Points: 1

# Comments: 0

Source: news.ycombinator.com

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Show HN: OgBlocks – Animated UI Library for CSS Haters

Hey HN, I’m Karan, a frontend developer who loves creating UIs, but I’ve found that many people don’t like CSS, but they want their website

November 11, 2025

Show HN: loopmaster – Live Audio Programming

Article URL: https://loopmaster.xyz Comments URL: https://news.ycombinator.com/item?id=45890747 Points: 1 # Comments: 0 Source: loopmaster.xyz

November 11, 2025

IT Support

Hosting & Email

Cloud Solutions

Cyber Security

Telephone & Internet