On-call is not a support function. It is a design feedback loop. If you treat it as the former, you will spend your career fighting fires. If you treat it as the latter, you will spend your career building better systems.
The signal
Every page is a signal. It tells you that a human engineer was needed to resolve a situation the system could not handle alone. That is a design failure.
Not necessarily a serious one. Some failures are rare and acceptable. But a recurring page — the same alert, the same service, the same root cause — is a design problem with a clear location.
The response
The standard on-call response is: diagnose, mitigate, recover, write a postmortem. This is correct but incomplete.
The complete response adds: change the design.
“Change the design” might mean adding circuit breakers. It might mean adjusting a retry strategy. It might mean building a self-healing mechanism. It might mean changing the SLO.
The point is that the page is an input, not just an interruption.
The culture
Teams that treat on-call as a tax produce engineers who are tired and resentful. Teams that treat on-call as a design feedback loop produce engineers who are engaged and improving.
The difference is whether the organisation responds to on-call data. If pages are logged, reviewed, and result in design changes, the on-call rotation is valuable. If pages are logged, closed, and forgotten, you have a culture problem that no tooling will fix.
Respect the signal. Change the system.