I’ve been interested in real-time Human-AI interaction for a while. This project is a prototype closed-loop drawing system, like “visual autocomplete” for drawings. The idea is that the user just draws along with the AI, without disrupting the flow through manual text prompting.
It works by AI continually observing and responding to live drawing on a canvas. A vision model (using Ollama) interprets what it sees, and that description drives real-time image generation (StreamDiffusion).
For real-time performance, this project is built in C++ and Python, leveraging the GPU for Spout-based texture sharing with minimal overhead.
Reusable components include:
– StreamDiffusionSpoutServer: lightweight Python server for real-time image generation with StreamDiffusion. Designed for interfacing with any Spout-compatible software and uses OSC for instructions.
– OllamaClient: minimal C++ library for interfacing with Ollama vision language models. Includes implementations for openFrameworks and Cinder.
The “visual autocomplete” concept has been explored in recent papers (e.g., arxiv.org/abs/2508.19254, arxiv.org/abs/2411.17673).
Hopefully, these open source components can help accelerate others experimenting and advancing this direction!
Comments URL: https://news.ycombinator.com/item?id=45645528
Points: 2
# Comments: 0
Source: github.com