On-Call Management for Modern Dev-Ops
One of the core pieces of maintaining a sophisticated operation is delegation of responsibilities. If one individual ends up doing bulk of the work then the whole process will be slowed down. Their individual efficiency will not hold up to the standard either. It may not be impossible, but it is surely improbable for a single person to excel at everything. That is why we divvy up tasks to multiple people.
This notion has been true for ages and it is the foundation which on-call management is built on. On-call refers to individuals who are responsible for addressing an issue at a given point in time. These set of individuals change around the clock or over days depending on how big the team is. For a team of two people it may be fairly easy to manage, but as the team grows it is going to become difficult to keep track of. In Dev-Ops and IT-Ops it plays a very important role to determine who incidents should be assigned to when they occur. Delays in notifying the correct people can result in hefty financial losses. Luckily modern incident response software comes with seamless on-call management capabilities.
Determining On-Call Switch Frequency
Before creating an on-call schedule, first you need to decide how often on-call roles will be switched – every few hours, every few days, weekly, monthly and so on. This depends on the team and their nature of work, but the general rule of thumb is to increase the on-call period as the number of incidents drop. It is an inverse relationship. For example, if a team sees only 2 incidents a week, they may be more likely to change on-calls weekly, while a team that sees 10 incidents a day may be more likely to switch up on-calls every day.
Another factor that may affect the choice is whether the need for support is around the clock or not. If it is, then it is likely that people work in shifts. Think about factories that run 24 hours a day – they always have a morning shift and a night shift at least. In those cases, the split is a no-brainer. The on-call will change depending on-call role delegation will be tied directly to the change in shifts.
Number of People On-Call at a Time
Usually only one person goes on-call at a time although in no way is this a standard rule. The number can quite easily be more. If the incidents that may arise are likely to be more complex, it will be perfectly normal to have multiple people go on-call at the same time. That will allow better handling of the incident. Consider an IT-Ops team whose major incidents impact both the database and the code base. They can choose to always have 2 people go on-call together so that when incidents happen, one can focus on resolving the database issue while the other focuses of fixing the code.
Even if you resort to having a single on-call responder, it is strongly advisable to have multiple layers of support enabled. It may be so that a single on-call responder is good enough to handle most of the incidents that occur, but sometimes they may not be in a position to address the issue or may not know enough to be of any support. In such cases, it is best practice to escalate the incident to secondary support. The secondary support will not be disturbed unless the primary support purposefully escalates to the secondary when in need.
When on-call management is handled by an incident response system, the secondary on-call plays another crucial role. He acts as the second line of defense when the primary on-call completely misses alerts raised from an incident. If an escalation policy was not set up and the primary on-call faltered, then incidents would go completely unnoticed. You will only realize once it is already too late. However, with an escalation policy in place, incident response systems automatically escalate the incident to the secondary on-call if the primary does not respond or acknowledge the notifications sent to them. This simple action can save you tons of money that you would have lost otherwise.
Creating an on-call schedule can significantly improve task delegation and incident handling in your organization. The need becomes more pressing when the firm grows and the need for top notch customer support becomes essential. Incident response systems can make all the difference then and help you manage your team. As everything else becomes more complex, they help you stay on top at all times.