Picture this…… You run several large Managed Services contracts. Your Network Operations Center (NOC) is responsible for monitoring and supporting first responder networks all across the country. Of course, with a first responder network, lives are at stake and these networks cannot be down. You have been involved from the beginning to ensure that there is redundancy to safeguard against exactly that. It is now 2:00 a.m. because, of course, it is. Network alarms seem to only happen overnight. Your phone is blowing up with text messages, most from network devices and a few from the NOC technicians monitoring these networks. The alerts and techs are warning of a substantial outage in one of those networks.
All backup devices are up and operating but your focus needs to be the primary devices. You hop on the phone and start talking to the NOC techs. They have done all the basic troubleshooting and cannot seem to reach the imperiled devices. They are asking for approval to escalate internally to the engineering staff, while they contact the on-call customer to get a status and let them know of their issue. Nobody is answering at the customer location. You assume it is due to the fact the customer’s primary network is done. Your engineers are also stumped as they cannot get to any of the devices either. Even the backdoor interfaces you have installed and configured.
As you are the 3rd level of escalation, it is your responsibility to notify your management counterpart at the customer location. Just as you are about to make that call, your first level NOC tech calls. He is in contact with the customer and has just been made aware that they are doing scheduled maintenance. Starting to feel relieved that there is no catastrophic issue going on there, you think about going back to sleep. You look up at the clock and realize this whole exercise has taken 2 hours and it is almost time to get up for the day. You make the coffee and begin to post-mortem.
With proper Change Management Processes
It’s 2:00 p.m. and you are heading into a customer’s Change Management meeting. You attend this meeting via phone, once a month. In this meeting, you discuss events that are planned regarding maintenance on the devices within that network. Your customer announces that network-wide maintenance will occur on the regular 3rd Thursday of the month. You note it for later reference. The meeting goes as planned and you begin to prepare a Change Notification for your NOC staff to read and file in preparation for the outage. The day of the actual work arrives. Your NOC techs, by design, have switched off the monitoring tool for those specific devices that we know will be affected. At the allotted time, the primary network is taken out of service. Your NOC team knows this as they see the secondary devices begin to carry the network traffic. The customer network team notifies you when they are completed and your team watches as the network nodes all return to green. Your team then goes in and re-activates all the alarming. Everything goes as planned and the network is back up and running on its primary devices.
So, what have we learned here? If you want a good nights sleep and not to be making a pot of coffee at 3 a.m. preparing to tell a customer his primary network is down, create, participate and exchange information prior to any network work. It is called Effective Change Management and it can make your life a whole lot more bearable. Not to mention, a better night’s sleep.