ValetAlpha

Scale to Zero

Server lifecycle, cold starts, and the Fly.io control plane.

Valet servers shut down when idle and start on demand. Inactive projects consume only storage, not compute. Long-idle projects are archived to R2 and release all Fly resources.

How it works

Each project has a dedicated Fly Machine with an attached volume. The project moves through three states:

  1. Active -- the Machine is running and serving requests. Fly auto-suspends it after the idle timeout (no open connections).
  2. Suspended -- the Machine is suspended and the volume is retained. Fly auto-starts it when the next request arrives via fly-replay routing. This is transparent to clients.
  3. Dormant -- no Fly resources exist. The database has been archived to Cloudflare R2. When a client connects, the control plane wakes the project: it provisions a new machine (from the warm pool or cold), restores the database from R2, and routes the request.

State transitions

  • Active to suspended: handled automatically by Fly when the machine is idle (no TCP connections).
  • Suspended to active: handled automatically by Fly when a request is routed to the machine via fly-replay.
  • Active to dormant: the control plane runs an hourly dormancy sweep. Projects inactive for longer than the threshold (default 7 days) are archived. The sweep starts the machine, calls /internal/archive-to-r2 to upload the database to R2, then deletes the machine and volume.
  • Dormant to active: when a request hits a dormant project, the control plane claims a machine from the warm pool (or cold-creates one), restores the database from R2, updates the project state, and returns a fly-replay header.

Why scale to zero

Most projects are not active 24/7. A development project might see traffic for a few hours a day. A demo might get a single visitor per week. Running dedicated servers for all of them wastes compute and money.

With scale-to-zero:

  • Idle projects cost nothing beyond volume storage (suspended) or R2 storage (dormant)
  • Active projects get dedicated server instances
  • Cold starts range from instant (suspended) to a few seconds (dormant)
  • No manual provisioning or teardown

Cold start latency

Latency depends on the project's current state:

StateCold startWhat happens
Suspended~1sFly auto-starts the machine
Dormant (warm pool hit)~800msClaim pre-provisioned machine, restore from R2
Dormant (cold create)~4sCreate volume + machine, wait for startup, restore from R2

The warm pool maintains pre-provisioned running machines (default 2) so that most dormant wakes avoid the cold-create path.

To minimize cold start impact:

  • Keep the database small by archiving old data
  • Use connection pooling in your app
  • For latency-sensitive projects, periodic activity (e.g. a health check ping) prevents the dormancy sweep from archiving

Configuration

The control plane accepts these environment variables:

Env varDefaultDescription
DORMANCY_THRESHOLD_DAYS7Days of inactivity before a project is archived to R2
WARM_POOL_SIZE2Number of pre-provisioned machines to maintain
WARM_POOL_REPLENISH_INTERVAL_SECS30How often the replenisher checks the pool

Fly handles idle suspend/auto-start timeouts at the machine level. The control plane only manages the dormancy sweep and warm pool.

Monitoring

The control plane exposes:

  • GET /health -- returns {"status": "ok"} for load balancer health checks
  • GET /api/projects/:id -- returns current project state, machine_id, region, last_active_at, and disk_bytes
  • GET /api/route/:project_id -- returns routing info (state and machine_id)

Background task progress (warm pool replenishment, dormancy sweeps) is logged via tracing. Use structured log aggregation for production monitoring.

See the deployment guide for production configuration.

On this page