Tri-Paragon Inc. 130 King Street West, Suite 1800, P.O. Box 427, Toronto, ON Canada M5X 1E3
Phone: 416.865.3392 Email: info@triparagon.com
Minimize the impacts of Critical Application Outages
Much is written about business continuity and recovery of operations after a disaster. What about recovery of a system or application that experiences a lengthy outage and negatively impacts the day-to-day operations of a business?
FACT: 25% of businesses that close after a natural disaster never reopen
FACT: 40 percent of small businesses never reopen their doors following a disaster
I have yet to see the statistics identifying the impact of a lengthy outage of a critical application on a business. I am willing to bet that the disaster recovery statistics are representative of this scenario. So, what to do and how to mitigate this risk?
First and foremost, understand the importance to your business of each application and whether or not you can continue to serve your customers if the application becomes unavailable and how long you can operate in this state. Understand the impacts of being able to take orders, fill and deliver them if what you consider to be a critical application is unavailable. For example, accounting applications although important for many reasons, typically are not involved in servicing a customer other than to provide them with an invoice and to reflect the historical financial performance of the organization. This I would not consider to be critical from the perspective of servicing a customer. It is certainly critical from the point of view of cash flow which can destroy a business if left for an extended period. However, if you know what the customer orders, it’s cost to the customer, you can prepare and send out a manual invoice and a follow-up with the appropriate collection actions. The point being that although it is cumbersome and outside the normal process you can put into place a temporary solution.
For example, I was involved with a supermarket organization whose policy was not to have prices on the products but instead scan the items and the system would automatically apply the prices from the system. Guess what. The point-of-sale system went down at one of our flagship stores and since there were no prices on the articles themselves we had to shut the store and either wait for the system to be fixed or we could price the items manually and then use adding machines to check out the customers. Of course, we would have to assume that the system would be unavailable longer than it took to price the items in the store and suffer the reputation hit from not being able to service our customers as they were used to. In order to mitigate the situation we chose to reprice the store while the system was being fixed and hope that the system would be operational long before we finished repricing. This was a reactive mitigation to the risk of the system failing. We felt in advance that there was enough redundancy built into the system that such an occurrence would not happen even though it did. There happened to be one major component of the system that was singular in nature i.e. was not redundant and, as we found out later, if it failed the whole system failed.
The reason for citing this example is to point out that every system should be analysed to understand the impacts in the event of failure and to proactively establish a mitigation strategy. Of course, in doing this analysis you have to determine the extent of the impacts on your customers and your reputation before spending time and money to effectively mitigate the outage occurrence. Part of our data center assessment service focuses on the impacts of outages of each individual application on customer and reputation, how long the outage can be sustained by time of day, and length of time to recover the application (worst-case) and whether or not the recovery is adequately documented and worthwhile.
The inherent values of having proactive mitigation strategies should be self-evident but it is amazing to me how many organizations spend huge amounts putting a disaster recovery plan in place and forget about recovery of individual application outages. Part of good IT disaster recovery planning must include planning for recovery of application outages that can have a major detrimental impact on customers and business reputation.