Claude says - The key is maintaining user agency—let them choose how to manage their usage rather than imposing arbitrary cutoffs.
It suggests:
Transparent queueing - Instead of blocking, queue requests with clear wait time estimates. Users can choose to wait or reschedule.
Usage smoothing - Soft caps with gradually increasing response times (e.g., 2s → 5s → 10s) rather than hard cutoffs.
Declared priority queues - Let users specify request urgency. Background tasks get lower priority but aren't blocked.
Time-based scheduling - Allow users to schedule non-urgent work during off-peak hours at standard rates.
Burst credits - Banking system where users accumulate credits during low usage periods for occasional heavy use.