Hi HN,
I built *LLM-Use*, an open-source intelligent router that helps reduce LLM API costs by automatically selecting the most appropriate model for each prompt.
I created it after realizing I was using GPT-4 for everything — including simple prompts like “translate hello to Spanish” — which cost $0.03 per call. Models like Mixtral can do the same for $0.0003.
### How it works:
– Uses NLP (spaCy + transformers) to analyze prompt complexity
– Routes to the optimal model (GPT-4, Claude, LLaMA, Mixtral, etc.)
– Uses semantic similarity scoring to preserve output quality
– Falls back gracefully if a model fails or gives poor results
### Key features:
– Real-time streaming support for all providers
– A/B testing with statistical significance
– Response caching (LRU + TTL)
– Circuit breakers for production stability
– FastAPI backend with Prometheus metrics
### Early results:
– Personal tests show up to 80% cost reduction
– Output quality preserved (verified via internal A/B testing)
### Technical notes:
– 2000+ lines of Python
– Supports OpenAI, Anthropic, Google, Groq, Ollama
– Complexity scoring: lexical diversity, prompt length, semantic analysis
– Quality checks: relevance, coherence, grammar
Repo: [https://github.com/JustVugg/llm-use](https://github.com/JustVugg/llm-use)
Thanks! Happy to answer questions.
Comments URL: https://news.ycombinator.com/item?id=45504149
Points: 1
# Comments: 0
Source: github.com