After confirming a problem in production it's important to act quickly and in the fashion required by the situation.
Triage, inform, restore, prevent
This instruction uses three levels of severity: critical, medium and low.
Decide how severe the problem is. Critical means that it's severely affecting users of the software. Providing false information or preventing users from accomplishing their tasks are in this category. Critical tasks require communication to stakeholders and an immediate plan for restoration of service. The usual process of incorporating and change control should be attended to only after the service is restored, when a proper fix can be made. Communication to relevant stakeholders should contain the following information:
Especially avoid tehnical jargon which will waste time and is of no relevance to the stakeholders.
After the relevant stakeholders have been informed and the appropriate measures have been made to restore service, find out more about the problem:
A problem of medium severity is something that doesn't affect as many users or it only causes trouble and doesn't prevent a task from being accomplished. Communication to stakeholders should consist of the same information as required by critical tasks. All of this information can be gathered at once since medium tasks aren't as time critical.
Low severity problems affect only very few users or cause only minor trouble. It is usually enough to communicate these to relevant stakeholder and discuss when is the appropriate time to attend them. Include all of the information as with medium tasks.
Triage, inform, restore, prevent
This instruction uses three levels of severity: critical, medium and low.
Decide how severe the problem is. Critical means that it's severely affecting users of the software. Providing false information or preventing users from accomplishing their tasks are in this category. Critical tasks require communication to stakeholders and an immediate plan for restoration of service. The usual process of incorporating and change control should be attended to only after the service is restored, when a proper fix can be made. Communication to relevant stakeholders should contain the following information:
- What is the extent of the problem
- Which users are affected
- How long has the problem existed
- When the problem can be fixed and at what cost
Especially avoid tehnical jargon which will waste time and is of no relevance to the stakeholders.
After the relevant stakeholders have been informed and the appropriate measures have been made to restore service, find out more about the problem:
- What caused the problem
- Are similar problems possible in other parts of the software
- When it might reoccur
- How it can be prevented
- How the development process can be improved so that the problem will not be introduced anymore
A problem of medium severity is something that doesn't affect as many users or it only causes trouble and doesn't prevent a task from being accomplished. Communication to stakeholders should consist of the same information as required by critical tasks. All of this information can be gathered at once since medium tasks aren't as time critical.
Low severity problems affect only very few users or cause only minor trouble. It is usually enough to communicate these to relevant stakeholder and discuss when is the appropriate time to attend them. Include all of the information as with medium tasks.