2026-04-23 13:56:09 -05:00
2026-04-23 13:09:53 -05:00
2026-04-18 15:22:12 -05:00
2026-04-23 14:14:21 -05:00
2026-04-18 15:22:12 -05:00
2026-04-18 15:26:34 -05:00
2026-04-23 13:09:53 -05:00
2026-04-18 15:22:12 -05:00
2026-04-23 14:14:21 -05:00
2026-04-23 14:14:21 -05:00
2026-04-18 15:22:12 -05:00
2026-04-23 13:44:16 -05:00
2026-04-23 14:09:29 -05:00
2026-04-23 14:14:21 -05:00
2026-04-18 15:22:12 -05:00

VLM Server

A multimodal visual analysis queue — submit an image + prompt, get a response from a cloud vision model. Switch between Ollama Cloud and OpenRouter by changing two environment variables, no code changes needed.

Architecture

Browser
  │
  ▼
Fastify server (port 3000)
  │  openai npm package (OpenAI-compatible client)
  ▼
LLM_BASE_URL (configured in .env)
  ├─ https://ollama.com/v1          → Ollama Cloud (qwen3.5:397b-cloud, etc.)
  └─ https://openrouter.ai/api/v1  → OpenRouter   (300+ providers)

Both Ollama Cloud and OpenRouter expose an OpenAI-compatible /v1/chat/completions endpoint, so the same openai npm package talks to both. No proxy or sidecar required.

Stack

Layer Technology
Backend Node.js · Fastify 5 · @fastify/websocket · @fastify/multipart
LLM client openai npm package (pointed at Ollama Cloud or OpenRouter)
Queue p-queue (in-process, no external server)
Database SQLite via Sequelize ORM
Frontend React 18 · Vite · plain CSS

Quick start

1. Install dependencies

npm install

2. Configure your provider

cp server/.env.example server/.env
# Edit server/.env — choose Ollama Cloud or OpenRouter (see comments inside)

3. Run in development

npm run dev

4. Production build

npm run build   # builds React into client/dist
npm start       # serves everything from Fastify on port 3000

Provider configuration

Edit server/.env and uncomment the block for the provider you want:

Ollama Cloud

Get an API key at https://ollama.com → account → API keys. Model IDs listed at https://ollama.com/search?c=cloud

LLM_BASE_URL=https://ollama.com/v1
LLM_API_KEY=your-ollama-api-key
LLM_MODEL=qwen3.5:397b-cloud

OpenRouter

Get an API key at https://openrouter.ai/keys. Model IDs listed at https://openrouter.ai/models (format: provider/model)

LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-v1-...
LLM_MODEL=qwen/qwen3.5-397b-a17b

Project structure

vision-jobs/
├── server/
│   ├── index.js              # Fastify entry point
│   ├── routes/jobs.js        # REST + WebSocket routes
│   ├── jobs/queue.js         # p-queue → openai client → provider
│   ├── db/models.js          # Sequelize Job model (SQLite)
│   ├── ws/broadcast.js       # WebSocket fan-out
│   └── .env.example
└── client/
    ├── index.html
    ├── vite.config.js
    └── src/
        ├── App.jsx
        ├── styles.css
        ├── components/
        │   ├── ImageDrop.jsx  # Drag-drop / file picker / camera
        │   └── JobCard.jsx    # Live status + result display
        ├── hooks/
        │   └── useJobSocket.js
        └── lib/
            └── api.js

How it works

  1. User drops an image + types a prompt → clicks Analyze.
  2. POST /api/jobs receives the multipart upload, base64-encodes the image, saves a queued job to SQLite, and enqueues it via p-queue.
  3. The queue runner calls the OpenAI-compatible /v1/chat/completions endpoint with the image embedded as a data: URI in an image_url content block.
  4. As status changes (queued → running → done/error), the server broadcasts job_update WebSocket messages to every connected client.
  5. React merges updates into the live job list — no polling.

Environment variables

Variable Default Description
LLM_BASE_URL https://ollama.com/v1 Provider endpoint.
LLM_API_KEY Required. API key for your chosen provider.
LLM_MODEL qwen3.5:397b-cloud Model identifier (format varies by provider).
PORT 3000 Server HTTP port.
JOB_CONCURRENCY 3 Max simultaneous LLM requests.
Description
No description provided
Readme 387 KiB
Languages
JavaScript 63.7%
CSS 32.4%
Dockerfile 2.6%
HTML 1.3%