VLM Server
A multimodal visual analysis queue — submit an image + prompt, get a response from a cloud vision model. Switch between Ollama Cloud and OpenRouter by changing two environment variables, no code changes needed.
Architecture
Browser
│
▼
Fastify server (port 3000)
│ openai npm package (OpenAI-compatible client)
▼
LLM_BASE_URL (configured in .env)
├─ https://ollama.com/v1 → Ollama Cloud (qwen3.5:397b-cloud, etc.)
└─ https://openrouter.ai/api/v1 → OpenRouter (300+ providers)
Both Ollama Cloud and OpenRouter expose an OpenAI-compatible /v1/chat/completions endpoint, so the same openai npm package talks to both. No proxy or sidecar required.
Stack
| Layer | Technology |
|---|---|
| Backend | Node.js · Fastify 5 · @fastify/websocket · @fastify/multipart |
| LLM client | openai npm package (pointed at Ollama Cloud or OpenRouter) |
| Queue | p-queue (in-process, no external server) |
| Database | SQLite via Sequelize ORM |
| Frontend | React 18 · Vite · plain CSS |
Quick start
1. Install dependencies
npm install
2. Configure your provider
cp server/.env.example server/.env
# Edit server/.env — choose Ollama Cloud or OpenRouter (see comments inside)
3. Run in development
npm run dev
- Frontend → http://localhost:5173
- Backend → http://localhost:3000
4. Production build
npm run build # builds React into client/dist
npm start # serves everything from Fastify on port 3000
Provider configuration
Edit server/.env and uncomment the block for the provider you want:
Ollama Cloud
Get an API key at https://ollama.com → account → API keys. Model IDs listed at https://ollama.com/search?c=cloud
LLM_BASE_URL=https://ollama.com/v1
LLM_API_KEY=your-ollama-api-key
LLM_MODEL=qwen3.5:397b-cloud
OpenRouter
Get an API key at https://openrouter.ai/keys.
Model IDs listed at https://openrouter.ai/models (format: provider/model)
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-v1-...
LLM_MODEL=qwen/qwen3.5-397b-a17b
Project structure
vision-jobs/
├── server/
│ ├── index.js # Fastify entry point
│ ├── routes/jobs.js # REST + WebSocket routes
│ ├── jobs/queue.js # p-queue → openai client → provider
│ ├── db/models.js # Sequelize Job model (SQLite)
│ ├── ws/broadcast.js # WebSocket fan-out
│ └── .env.example
└── client/
├── index.html
├── vite.config.js
└── src/
├── App.jsx
├── styles.css
├── components/
│ ├── ImageDrop.jsx # Drag-drop / file picker / camera
│ └── JobCard.jsx # Live status + result display
├── hooks/
│ └── useJobSocket.js
└── lib/
└── api.js
How it works
- User drops an image + types a prompt → clicks Analyze.
POST /api/jobsreceives the multipart upload, base64-encodes the image, saves aqueuedjob to SQLite, and enqueues it viap-queue.- The queue runner calls the OpenAI-compatible
/v1/chat/completionsendpoint with the image embedded as adata:URI in animage_urlcontent block. - As status changes (
queued → running → done/error), the server broadcastsjob_updateWebSocket messages to every connected client. - React merges updates into the live job list — no polling.
Environment variables
| Variable | Default | Description |
|---|---|---|
LLM_BASE_URL |
https://ollama.com/v1 |
Provider endpoint. |
LLM_API_KEY |
— | Required. API key for your chosen provider. |
LLM_MODEL |
qwen3.5:397b-cloud |
Model identifier (format varies by provider). |
PORT |
3000 |
Server HTTP port. |
JOB_CONCURRENCY |
3 |
Max simultaneous LLM requests. |