Scale to Zero
Server lifecycle, cold starts, and the Fly.io control plane.
Valet servers shut down when idle and start on demand. Inactive projects consume only storage, not compute. Long-idle projects are archived to R2 and release all Fly resources.
How it works
Each project has a dedicated Fly Machine with an attached volume. The project moves through three states:
- Active -- the Machine is running and serving requests. Fly auto-suspends it after the idle timeout (no open connections).
- Suspended -- the Machine is suspended and the volume is retained. Fly auto-starts it when the next request arrives via
fly-replayrouting. This is transparent to clients. - Dormant -- no Fly resources exist. The database has been archived to Cloudflare R2. When a client connects, the control plane wakes the project: it provisions a new machine (from the warm pool or cold), restores the database from R2, and routes the request.
State transitions
- Active to suspended: handled automatically by Fly when the machine is idle (no TCP connections).
- Suspended to active: handled automatically by Fly when a request is routed to the machine via
fly-replay. - Active to dormant: the control plane runs an hourly dormancy sweep. Projects inactive for longer than the threshold (default 7 days) are archived. The sweep starts the machine, calls
/internal/archive-to-r2to upload the database to R2, then deletes the machine and volume. - Dormant to active: when a request hits a dormant project, the control plane claims a machine from the warm pool (or cold-creates one), restores the database from R2, updates the project state, and returns a
fly-replayheader.
Why scale to zero
Most projects are not active 24/7. A development project might see traffic for a few hours a day. A demo might get a single visitor per week. Running dedicated servers for all of them wastes compute and money.
With scale-to-zero:
- Idle projects cost nothing beyond volume storage (suspended) or R2 storage (dormant)
- Active projects get dedicated server instances
- Cold starts range from instant (suspended) to a few seconds (dormant)
- No manual provisioning or teardown
Cold start latency
Latency depends on the project's current state:
| State | Cold start | What happens |
|---|---|---|
| Suspended | ~1s | Fly auto-starts the machine |
| Dormant (warm pool hit) | ~800ms | Claim pre-provisioned machine, restore from R2 |
| Dormant (cold create) | ~4s | Create volume + machine, wait for startup, restore from R2 |
The warm pool maintains pre-provisioned running machines (default 2) so that most dormant wakes avoid the cold-create path.
To minimize cold start impact:
- Keep the database small by archiving old data
- Use connection pooling in your app
- For latency-sensitive projects, periodic activity (e.g. a health check ping) prevents the dormancy sweep from archiving
Configuration
The control plane accepts these environment variables:
| Env var | Default | Description |
|---|---|---|
DORMANCY_THRESHOLD_DAYS | 7 | Days of inactivity before a project is archived to R2 |
WARM_POOL_SIZE | 2 | Number of pre-provisioned machines to maintain |
WARM_POOL_REPLENISH_INTERVAL_SECS | 30 | How often the replenisher checks the pool |
Fly handles idle suspend/auto-start timeouts at the machine level. The control plane only manages the dormancy sweep and warm pool.
Monitoring
The control plane exposes:
GET /health-- returns{"status": "ok"}for load balancer health checksGET /api/projects/:id-- returns current project state, machine_id, region, last_active_at, and disk_bytesGET /api/route/:project_id-- returns routing info (state and machine_id)
Background task progress (warm pool replenishment, dormancy sweeps) is logged via tracing. Use structured log aggregation for production monitoring.
See the deployment guide for production configuration.