# VLM Server A multimodal visual analysis queue — submit an image + prompt, get a response from a cloud vision model. Switch between **Ollama Cloud** and **OpenRouter** by changing two environment variables, no code changes needed. ## Architecture ``` Browser │ ▼ Fastify server (port 3000) │ openai npm package (OpenAI-compatible client) ▼ LLM_BASE_URL (configured in .env) ├─ https://ollama.com/v1 → Ollama Cloud (qwen3.5:397b-cloud, etc.) └─ https://openrouter.ai/api/v1 → OpenRouter (300+ providers) ``` Both Ollama Cloud and OpenRouter expose an OpenAI-compatible `/v1/chat/completions` endpoint, so the same `openai` npm package talks to both. No proxy or sidecar required. ## Stack | Layer | Technology | |---|---| | Backend | Node.js · Fastify 5 · `@fastify/websocket` · `@fastify/multipart` | | LLM client | `openai` npm package (pointed at Ollama Cloud or OpenRouter) | | Queue | `p-queue` (in-process, no external server) | | Database | SQLite via Sequelize ORM | | Frontend | React 18 · Vite · plain CSS | ## Quick start **1. Install dependencies** ```bash npm install ``` **2. Configure your provider** ```bash cp server/.env.example server/.env # Edit server/.env — choose Ollama Cloud or OpenRouter (see comments inside) ``` **3. Run in development** ```bash npm run dev ``` - Frontend → http://localhost:5173 - Backend → http://localhost:3000 **4. Production build** ```bash npm run build # builds React into client/dist npm start # serves everything from Fastify on port 3000 ``` ## Provider configuration Edit `server/.env` and uncomment the block for the provider you want: ### Ollama Cloud Get an API key at https://ollama.com → account → API keys. Model IDs listed at https://ollama.com/search?c=cloud ```bash LLM_BASE_URL=https://ollama.com/v1 LLM_API_KEY=your-ollama-api-key LLM_MODEL=qwen3.5:397b-cloud ``` ### OpenRouter Get an API key at https://openrouter.ai/keys. Model IDs listed at https://openrouter.ai/models (format: `provider/model`) ```bash LLM_BASE_URL=https://openrouter.ai/api/v1 LLM_API_KEY=sk-or-v1-... LLM_MODEL=qwen/qwen3.5-397b-a17b ``` ## Project structure ``` vision-jobs/ ├── server/ │ ├── index.js # Fastify entry point │ ├── routes/jobs.js # REST + WebSocket routes │ ├── jobs/queue.js # p-queue → openai client → provider │ ├── db/models.js # Sequelize Job model (SQLite) │ ├── ws/broadcast.js # WebSocket fan-out │ └── .env.example └── client/ ├── index.html ├── vite.config.js └── src/ ├── App.jsx ├── styles.css ├── components/ │ ├── ImageDrop.jsx # Drag-drop / file picker / camera │ └── JobCard.jsx # Live status + result display ├── hooks/ │ └── useJobSocket.js └── lib/ └── api.js ``` ## How it works 1. User drops an image + types a prompt → clicks **Analyze**. 2. `POST /api/jobs` receives the multipart upload, base64-encodes the image, saves a `queued` job to SQLite, and enqueues it via `p-queue`. 3. The queue runner calls the OpenAI-compatible `/v1/chat/completions` endpoint with the image embedded as a `data:` URI in an `image_url` content block. 4. As status changes (`queued → running → done/error`), the server broadcasts `job_update` WebSocket messages to every connected client. 5. React merges updates into the live job list — no polling. ## Environment variables | Variable | Default | Description | |---|---|---| | `LLM_BASE_URL` | `https://ollama.com/v1` | Provider endpoint. | | `LLM_API_KEY` | — | **Required.** API key for your chosen provider. | | `LLM_MODEL` | `qwen3.5:397b-cloud` | Model identifier (format varies by provider). | | `PORT` | `3000` | Server HTTP port. | | `JOB_CONCURRENCY` | `3` | Max simultaneous LLM requests. |