Private AI clusters pooled over the internet, on your hardware.
No VPS, no API fees.

Run massive open-source models no single machine could host. Send one invite link, everyone joins, everyone shares the compute.

Get started Learn more

$ pip install progresspals

~ — pals create

$pals create Qwen/Qwen3-Coder-480B-A35B-Instruct

✓ Starting NEW swarm · this peer holds layers 0–6

multiaddr: /ip4/HOST/tcp/PORT/p2p/PEER_ID

$pals peers list

PEER ID JOINED STATUS

● 12D3KooWQH…7m4xK 5m ago active

● 12D3KooWXp…q9d2N 2m ago active

● 12D3KooWBs…3hT6r just now active

─ 3 peers · 62/62 layers covered

How it works

Three commands. Real distributed inference.

No coordination overhead, no Kubernetes, no public swarms. Just your people, your hardware, your model.

Create a swarm

Pick a model. The CLI claims the layers your machine can hold and starts hosting. Mint an invite for your pals with pals invite create.

$ pals create Qwen/Qwen3-32B

Invite your team

Mint a single-use invite token and hand it to each pal over a secure channel. They redeem it, then join your swarm and host their slice of the model.

$ pals invite create --max-uses 1

Run it like OpenAI

Start the local OpenAI-compatible endpoint. Point Cursor, Aider, Continue, or any SDK at it. Inference flows through the chain.

$ pals serve

Read the full walkthrough

The killer feature

A drop-in replacement for OpenAI — running on your team's hardware.

pals serve exposes the swarm as a local OpenAI-compatible endpoint at http://localhost:8080/v1. Any tool that speaks the OpenAI API works unchanged — point it at your endpoint and it codes, chats, and reasons through your private cluster.

CursorAiderContinue.devClineRoo Coden8nOpenAI SDKLangChainCustom scripts

POST /v1/chat/completions · GET /v1/models · SSE streaming

~ — pals serve + Cursor

$pals serve Qwen/Qwen3-Coder-480B-A35B-Instruct

✓ OpenAI-compatible endpoint live

http://127.0.0.1:8080/v1

streaming enabled

#In Cursor: Settings → OpenAI Base URL

→ http://localhost:8080/v1

→ model: Qwen/Qwen3-Coder-480B-A35B-Instruct

$curl http://localhost:8080/v1/chat/completions \

-d '{"model":"...","messages":[...]}'

{ "id": "cmpl-...", "choices": [...

For AI agents

Plug it into the tools your team
already uses.

Because the swarm exposes a standard OpenAI-compatible endpoint, anything in your agent stack — coding harnesses, gateways, frameworks — just works.

In your editor

Coding agents

Your team's swarm becomes the brain inside the IDE. Point the agent at the local endpoint and it codes, edits, and refactors against your shared cluster.

CursorClineRoo CodeContinue.devAiderZed

# Cursor → Settings → Models → Custom OpenAI Base URL
http://localhost:8080/v1

# Aider
aider --openai-api-base http://localhost:8080/v1 \
      --openai-api-key any-string

Self-hosted gateways

Personal AI agents

Self-hosted agents and assistants that already speak the OpenAI API. Swap the provider URL for the swarm and they run on your team's hardware instead of someone else's GPUs.

OpenClawOpen WebUIOpen HandsPlandexLiteLLM proxy

# Most gateways read these standard env vars:
export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=any-string

Build your own

Agent frameworks & SDKs

Stack your own agents on top. Anything built on the OpenAI SDK accepts a base_url override — your swarm becomes the model layer underneath multi-agent orchestration, RAG, evals, anything.

LangChainLlamaIndexAutoGenCrewAIVercel AI SDKOpenAI SDK

from openai import OpenAI

client = OpenAI(
  base_url="http://localhost:8080/v1",
  api_key="any-string",
)

client.chat.completions.create(
  model="Qwen/Qwen3-Coder-480B-A35B-Instruct",
  messages=[{"role":"user","content":"..."}],
)

Every tool listed accepts a custom OpenAI base URL. If yours does, it will too — there is no special integration, just the standard /v1/chat/completions contract with SSE streaming.

Features

Built for teams that want
their own models, privately.

Everything you need to stand up a serious cluster with people you trust — without renting a single GPU.

Invite-only swarms

No public discovery, no random peers. Single-use tokens, regenerable, expiring. Only people you invite can join.

OpenAI-compatible endpoint

`pals serve` exposes /v1/chat/completions and /v1/models with SSE streaming. Cursor, Aider, Continue work unchanged.

Encrypted activations

Per-swarm AES-256-GCM key derived from your swarm's shared secret via HKDF-SHA256. Activation tensors are encrypted before leaving each peer.

Pipeline parallelism

Each peer holds a contiguous slice of the model. Inference flows through the chain one peer at a time, so the swarm can run models no single machine could hold.

Run 480B on consumer GPUs

Qwen3-Coder 480B, Qwen3 235B, Llama 405B, Mixtral 8x22B, Falcon 180B — models no single machine can hold. Spread them across 4, 8, 20 peers.

Member controls

Live peer list, invite status, swarm health. Kick peers, revoke and re-issue invites — from the CLI or the live pals dash TUI.

Supported models

Qwen3-Coder 480B, across your team's GPUs.

Qwen 3, Qwen 3-Coder, Qwen 2.5, Qwen 2.5-Coder, Llama, Mixtral, Falcon, and BLOOM families all work out of the box. More architectures land as we add them.

HuggingFace model IDs work directly — just pass the id to pals create.

Qwen 3-Coder

Alibaba

›30B-A3B
›480B-A35B

Qwen 3

Alibaba

›0.6B
›1.7B
›4B
›8B
›14B
›32B
›30B-A3B
›235B-A22B

Qwen 2.5-Coder

Alibaba

›0.5B
›1.5B
›3B
›7B
›14B
›32B

Qwen 2.5

Alibaba

›0.5B
›1.5B
›3B
›7B
›14B
›32B
›72B

Llama

Mixtral

Mistral

›8x7B
›8x22B

Falcon

TII

›40B
›180B

BLOOM

BigScience

›176B

Security · honest disclosure

We tell you exactly what the trust model is.

Only invite people you trust.

We are not a public network. There is no swarm discovery, no stranger prompts, no content moderation queue. Your swarm is exactly the people you sent the link to.

Activations are encrypted in transit.

We derive a 256-bit AES-GCM key from your swarm's shared secret via HKDF-SHA256. Tensors are encrypted before leaving a peer and decrypted on arrival. The key is computed client-side and never leaves member machines.

What we do not pretend.

P2P inference exposes IP addresses to other swarm members. The first peer in the chain sees decrypted inputs. We sandbox computation where the OS allows it, but this is not a hardware enclave. Use a VPN if the threat model demands it.

Pricing

Free. The whole thing.

No paid tier yet. We will add one based on what teams actually ask for — not before.

Free

$0/ month

Everything. No card. No usage caps.

Private swarms — invite-only, no public discovery
Single-use invite tokens, revocable, expirable
Encrypted activations (AES-256-GCM, HKDF-derived)
Member list, kick, status — from the CLI
Full CLI surface (init, create, join, serve, dash + more)
OpenAI-compatible local endpoint (pals serve)
Live read-only TUI dashboard (pals dash)
Account-backed invite verification + allow-list

Install now Read the trust model

FAQ

Questions teams ask
before they install.

If yours is not here, the answer is probably either in how it works or in the trust model.

What is ProgressPals?

Private, peer-to-peer AI inference. You and a small group of trusted people pool your hardware over the internet to run large open-source models that no single machine could host on its own. One CLI, one invite link, one local OpenAI-compatible endpoint.

What models can my swarm run?

Qwen 3 (0.6B–32B dense + 30B-A3B / 235B-A22B MoE), Qwen 3-Coder (30B-A3B and 480B-A35B), Qwen 2.5 (0.5B–72B), Qwen 2.5-Coder (0.5B–32B), Llama 2 / 3 / 3.1 / 3.3 up to 405B, Mixtral 8x7B and 8x22B, Falcon 40B and 180B, BLOOM 176B. Pass any supported HuggingFace model ID directly to pals create.

Can my team use it with Cursor, Aider, or our agent framework?

Yes. pals serve exposes an OpenAI-compatible endpoint at http://localhost:8080/v1. Point Cursor, Cline, Roo Code, Continue, Aider, Zed, OpenClaw, Open WebUI, n8n, LangChain, LlamaIndex, AutoGen, CrewAI, the Vercel AI SDK, or anything that uses the OpenAI SDK directly at it — no code changes.

Who can see my prompts?

The first peer in your chain decrypts your input to run their layers — that is how transformer inference works at all. Activations between subsequent peers are encrypted with a per-swarm AES-256-GCM key derived from your swarm's shared secret via HKDF. The trust model is simple and honest: only invite people you would trust to see your prompts.

Why private swarms only?

Public AI swarms create content moderation queues, expose users to stranger prompts, and pile on legal liability. Removing public swarms removes all three. You only compute on (and decrypt inputs from) people you actually invited.

Do I need a GPU?

Strongly recommended. Each peer's contribution scales with how many model layers their VRAM can hold. CPU-only peers can technically join, but throughput will be slow enough that you probably want at least one consumer GPU per peer.

Does it work on Apple Silicon (M1, M2, M3, M4)?

Yes. Apple Silicon Macs can join any swarm and contribute layers via PyTorch's Metal path. Per-pal throughput is lower than on equivalent NVIDIA hardware, so a Mac is often best as one pal in a mixed swarm or as a client running pals serve.

How many peers do I need for a big model?

It depends on the model and how aggressively it is quantized, but the rule is intuitive: more layers in the model, or less VRAM per peer, means more peers. Each peer can host as many layers as fits its device (configurable per-peer with --num-blocks).

Is it really free?

Yes. No paid tier yet. We will add one when we have real signal from teams about what is worth charging for — not before.

Start your first swarm
in under five minutes.

Linux and macOS. NVIDIA, Apple Silicon, or CPU-only.

install

$pip install progresspals

$pals init

✓ Config written to ~/.config/progresspals/config.json (0600)

$pals create Qwen/Qwen3-32B

✓ Starting NEW swarm

multiaddr: /ip4/HOST/tcp/PORT/p2p/PEER_ID

Learn more Read the trust model

Private AI clusters pooled over the internet, on your hardware.No VPS, no API fees.

Three commands. Real distributed inference.

Create a swarm

Invite your team

Run it like OpenAI

A drop-in replacement for OpenAI — running on your team's hardware.

Plug it into the tools your teamalready uses.

Coding agents

Personal AI agents

Agent frameworks & SDKs

Built for teams that wanttheir own models, privately.

Invite-only swarms

OpenAI-compatible endpoint

Encrypted activations

Pipeline parallelism

Run 480B on consumer GPUs

Member controls

Qwen3-Coder 480B, across your team's GPUs.

Qwen 3-Coder

Qwen 3

Qwen 2.5-Coder

Qwen 2.5

Llama

Mixtral

Falcon

BLOOM

We tell you exactly what the trust model is.

Only invite people you trust.

Activations are encrypted in transit.

What we do not pretend.

Free. The whole thing.

Questions teams askbefore they install.

Start your first swarmin under five minutes.

Private AI clusters pooled over the internet, on your hardware.
No VPS, no API fees.

Plug it into the tools your team
already uses.

Built for teams that want
their own models, privately.

Questions teams ask
before they install.

Start your first swarm
in under five minutes.