vision-server/README.md

# VLM Server

A multimodal visual analysis queue — submit an image + prompt, get a response from a cloud vision model. Switch between **Ollama Cloud** and **OpenRouter** by changing two environment variables, no code changes needed.

## Architecture

```
Browser
  │
  ▼
Fastify server (port 3000)
  │  openai npm package (OpenAI-compatible client)
  ▼
LLM_BASE_URL (configured in .env)
  ├─ https://ollama.com/v1          → Ollama Cloud (qwen3.5:397b-cloud, etc.)
  └─ https://openrouter.ai/api/v1  → OpenRouter   (300+ providers)
```

Both Ollama Cloud and OpenRouter expose an OpenAI-compatible `/v1/chat/completions` endpoint, so the same `openai` npm package talks to both. No proxy or sidecar required.

## Stack

| Layer | Technology |
|---|---|
| Backend | Node.js · Fastify 5 · `@fastify/websocket` · `@fastify/multipart` |
| LLM client | `openai` npm package (pointed at Ollama Cloud or OpenRouter) |
| Queue | `p-queue` (in-process, no external server) |
| Database | SQLite via Sequelize ORM |
| Frontend | React 18 · Vite · plain CSS |

## Quick start

**1. Install dependencies**
```bash
npm install
```

**2. Configure your provider**
```bash
cp server/.env.example server/.env
# Edit server/.env — choose Ollama Cloud or OpenRouter (see comments inside)
```

**3. Run in development**
```bash
npm run dev
```

- Frontend → http://localhost:5173
- Backend  → http://localhost:3000

**4. Production build**
```bash
npm run build   # builds React into client/dist
npm start       # serves everything from Fastify on port 3000
```

## Provider configuration

Edit `server/.env` and uncomment the block for the provider you want:

### Ollama Cloud
Get an API key at https://ollama.com → account → API keys.
Model IDs listed at https://ollama.com/search?c=cloud
```bash
LLM_BASE_URL=https://ollama.com/v1
LLM_API_KEY=your-ollama-api-key
LLM_MODEL=qwen3.5:397b-cloud
```

### OpenRouter
Get an API key at https://openrouter.ai/keys.
Model IDs listed at https://openrouter.ai/models (format: `provider/model`)
```bash
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-v1-...
LLM_MODEL=qwen/qwen3.5-397b-a17b
```

## Project structure

```
vision-jobs/
├── server/
│   ├── index.js              # Fastify entry point
│   ├── routes/jobs.js        # REST + WebSocket routes
│   ├── jobs/queue.js         # p-queue → openai client → provider
│   ├── db/models.js          # Sequelize Job model (SQLite)
│   ├── ws/broadcast.js       # WebSocket fan-out
│   └── .env.example
└── client/
    ├── index.html
    ├── vite.config.js
    └── src/
        ├── App.jsx
        ├── styles.css
        ├── components/
        │   ├── ImageDrop.jsx  # Drag-drop / file picker / camera
        │   └── JobCard.jsx    # Live status + result display
        ├── hooks/
        │   └── useJobSocket.js
        └── lib/
            └── api.js
```

## How it works

1. User drops an image + types a prompt → clicks **Analyze**.
2. `POST /api/jobs` receives the multipart upload, base64-encodes the image, saves a `queued` job to SQLite, and enqueues it via `p-queue`.
3. The queue runner calls the OpenAI-compatible `/v1/chat/completions` endpoint with the image embedded as a `data:` URI in an `image_url` content block.
4. As status changes (`queued → running → done/error`), the server broadcasts `job_update` WebSocket messages to every connected client.
5. React merges updates into the live job list — no polling.

## Environment variables

| Variable | Default | Description |
|---|---|---|
| `LLM_BASE_URL` | `https://ollama.com/v1` | Provider endpoint. |
| `LLM_API_KEY` | — | **Required.** API key for your chosen provider. |
| `LLM_MODEL` | `qwen3.5:397b-cloud` | Model identifier (format varies by provider). |
| `PORT` | `3000` | Server HTTP port. |
| `JOB_CONCURRENCY` | `3` | Max simultaneous LLM requests. |