Serve your swarm as an OpenAI endpoint.
pals serve exposes a standard OpenAI HTTP API on your machine. Anything that speaks OpenAI — Cursor, Aider, Continue, the openai Python SDK, LangChain, OpenWebUI — works unchanged. This guide covers the localhost case, the LAN case, and the gotchas.
Localhost (the default)
The simplest setup. The server binds to 127.0.0.1:8080 and only your machine can reach it.
Talk to it from Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="any-string",
)
resp = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B",
messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)The api_key can be any non-empty string when serving on loopback. The OpenAI SDK requires the field but the server doesn’t enforce it on 127.0.0.1.
Point Cursor / Aider / Continue at it
Each tool exposes a custom OpenAI base URL setting.
- Cursor: Settings → Models → add a custom model with base URL
http://localhost:8080/v1. - Aider:
aider --openai-api-base http://localhost:8080/v1 --openai-api-key any-string. - Continue: in
config.json, add a model withprovider: "openai"andapiBase: "http://localhost:8080/v1". - OpenWebUI / LiteLLM / generic gateways: most read
OPENAI_API_BASEandOPENAI_API_KEY.
LAN (other machines on your network)
To let teammates on your LAN reach the endpoint, bind a non-loopback interface. pals serve refuses to do this without an API key — see the callout below.
--api-key. An unauthenticated endpoint with your swarm behind it would let anyone on the network use your cluster. The CLI refuses to start a non-loopback bind without authentication.Clients must present the key
client = OpenAI(
base_url="http://YOUR_LAN_IP:8080/v1",
api_key="<the same value as PROGRESSPALS_SERVE_API_KEY>",
)Share the API key with your teammates the same way you’d share any secret — Signal, an encrypted note, 1Password. Never commit it to git.
Streaming
SSE streaming is enabled automatically. Pass stream=True in the OpenAI SDK and the response is an iterable of chunks. Mid-stream errors are surfaced as a final chunk with type: "error" and the stream then closes cleanly with [DONE].
/v1/models
Returns the single model your swarm is serving. Most OpenAI clients call this to enumerate model ids; the response is compatible with the OpenAI shape.