Hey HN,
I’ve been playing around with ways to make retrieval pipelines faster, and ended up building something I’m calling PyNIFE (Nearly Inference-Free Embeddings).
The idea is simple: train a static embedding model that’s fully aligned with a bigger “teacher” model, so you can skip expensive inference almost entirely. In practice, that means 400-900× faster embedding generation on CPU, while still working with the same vector index and staying compatible with your existing setup.
You can even mix and match: use the original model for accuracy when you need it, and PyNIFE for ultra-fast lookups or agent loops.
It’s still early, and I’d love feedback, especially on where this might break, what kinds of workloads you’d test it on, and any ideas for better evaluation or visualization.
Repo: https://github.com/stephantul/pynife
Comments URL: https://news.ycombinator.com/item?id=45862987
Points: 1
# Comments: 0
Source: github.com
