Scale to Zero

Valet servers shut down when idle and start on demand. Inactive projects consume only disk storage, not compute.

How it works

The orchestrator is a long-running process that manages the lifecycle of per-project Valet servers:

A client connects with a project ID
The orchestrator checks if a server is running for that project
If no server is running, the orchestrator spawns one
The orchestrator proxies the WebSocket connection to the project server
Multiple clients can connect to the same server
When all clients disconnect, an idle timer starts
After the idle timeout (default: 5 minutes), the orchestrator kills the server process
The SQLite database stays on disk
The next client connection spawns a fresh server

Most projects are not active 24/7. A development project might see traffic for a few hours a day. A demo might get a single visitor per week. Running dedicated servers for all of them wastes compute and money.

With scale-to-zero:

Idle projects cost nothing beyond disk storage
Active projects get dedicated server instances
Cold starts take 1-3 seconds
No manual provisioning or teardown

Cold start latency

When no server is running for a project, the first client connection triggers a spawn. This takes 1-3 seconds depending on database size. All subsequent connections are instant because the server is already warm.

To minimize cold start impact:

Keep the database small by archiving old data
Use connection pooling in your app
For latency-sensitive projects, consider a keep-alive ping to prevent the server from going idle

Configuration

The orchestrator accepts these flags:

Flag	Default	Description
`--idle-timeout`	`300`	Seconds before killing an idle server
`--health-check-interval`	`15`	Seconds between server health checks

Tune these based on your traffic patterns. Longer idle timeouts reduce cold starts but increase resource usage. Shorter idle timeouts save compute but increase cold start frequency.

Monitoring

The orchestrator exposes a /status endpoint that returns current server states, active connection counts, and health information. Use this endpoint for deployment monitoring and alerting.

See the deployment guide for production configuration.

Scale to Zero

How it works

Why scale to zero

Cold start latency

Configuration

Monitoring

On this page