My company runs a 24/7 site with a substantial number of users and connections to partner systems all over the world. We do what we can to make the system fault tolerant, but problems can still appear at any time of day or night. Ideally we would have a technical support team that’s staffed around the clock, but that not in the cards for now.
We used to have a system where anyone that could fix a problem would get a text message, and whoever was closest to a computer would respond. It worked most of the time, but there were some issues. The team creating the tickets would often be unsure of which component was broken, so the issues would hit lots of people who would be unable to help, and it also exposes a quirk of human behaviour that makes us less likely to help when more people are available.
[Read More]