Documentation menu
Guide

Serve your swarm as an OpenAI endpoint.

pals serve exposes a standard OpenAI HTTP API on your machine. Anything that speaks OpenAI — Cursor, Aider, Continue, the openai Python SDK, LangChain, OpenWebUI — works unchanged. This guide covers the localhost case, the LAN case, and the gotchas.

Localhost (the default)

The simplest setup. The server binds to 127.0.0.1:8080 and only your machine can reach it.

serve local
$pals serve meta-llama/Llama-3.1-8B
✓ listening on http://127.0.0.1:8080/v1

Talk to it from Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="any-string",
)

resp = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B",
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)

The api_key can be any non-empty string when serving on loopback. The OpenAI SDK requires the field but the server doesn’t enforce it on 127.0.0.1.

Point Cursor / Aider / Continue at it

Each tool exposes a custom OpenAI base URL setting.

  • Cursor: Settings → Models → add a custom model with base URL http://localhost:8080/v1.
  • Aider: aider --openai-api-base http://localhost:8080/v1 --openai-api-key any-string.
  • Continue: in config.json, add a model with provider: "openai" and apiBase: "http://localhost:8080/v1".
  • OpenWebUI / LiteLLM / generic gateways: most read OPENAI_API_BASE and OPENAI_API_KEY.

LAN (other machines on your network)

To let teammates on your LAN reach the endpoint, bind a non-loopback interface. pals serve refuses to do this without an API key — see the callout below.

Public binding requires --api-key. An unauthenticated endpoint with your swarm behind it would let anyone on the network use your cluster. The CLI refuses to start a non-loopback bind without authentication.
serve LAN
$export PROGRESSPALS_SERVE_API_KEY=$(openssl rand -hex 32)
$pals serve meta-llama/Llama-3.1-8B --host 0.0.0.0
✓ listening on http://0.0.0.0:8080/v1
authorization required

Clients must present the key

client = OpenAI(
    base_url="http://YOUR_LAN_IP:8080/v1",
    api_key="<the same value as PROGRESSPALS_SERVE_API_KEY>",
)

Share the API key with your teammates the same way you’d share any secret — Signal, an encrypted note, 1Password. Never commit it to git.

Streaming

SSE streaming is enabled automatically. Pass stream=True in the OpenAI SDK and the response is an iterable of chunks. Mid-stream errors are surfaced as a final chunk with type: "error" and the stream then closes cleanly with [DONE].

/v1/models

Returns the single model your swarm is serving. Most OpenAI clients call this to enumerate model ids; the response is compatible with the OpenAI shape.

Next steps