I’m not sure exactly when it happened, but it seems to me that over recent years every few weeks we hear about this bank or another suffering an outage impacting customers or clients on some scale.
The payments environment is undergoing somewhat of an evolution at the moment. Faster and often instant payments are becoming the norm, as is mobile banking and a whole host of other innovations are being introduced to meet emerging customer demands. Maybe it’s the mass adoption of Twitter and other means of social technology that are fueling the media storm and highlighting the outages, but it just seems to me these new developments are far too often overshadowed by a series of technical glitches.
As an engineer I really do feel the pain that some of these established players are going through. They’re trying to introduce new functionality whilst controlling their very intricate IT infrastructures and that’s not easy. Some industry insiders tell me that as much as 80% of budget is spent on keeping the show on the road, leaving only 20% for innovation and updating what are, in some cases “venerable” existing systems.
Many established providers really do have very complex payment processing environments and that’s incredibly tough to deal with. However, I can’t help but feel that the banks really need to start radically reducing the frequency of incidents, as it is hardly doing their reputations much good and competition is increasing. Commoditisation of bank accounts and coming portable account numbers are going to make it increasingly easy for customers to change provider after one outage too many.
To provide a bit of background here on the complexity issue and I am not saying this is true for all traditional providers, but for many, it seems they suffer from a set of endemic problems.
Technically one of their challenges is that their core systems tend to be pretty old, built back in the day when the main aim was automation of manual and batch-oriented processes that took multiple days. Add to this the fact that over the past couple of decades mergers and acquisitions have often led to the inheritance of several such systems and things haven’t got simpler. Over the years new pieces of functionality and payment platforms have been bolted on supporting for instance internet banking, then portals for mobile banking and other payment means have been supplemented.
Once you start talking international payments you can often start multiplying variances of this model in different regions and considering divergent local demands and jurisdictional quirks resulting in system modification, some firms are finding they now have a very complex infrastructure.
And don’t get me wrong, it’s not that I am saying there is anything wrong with the core technology, in fact it is probably rather good at doing what it was designed to do, however it may now be supporting a very different set of business processes or payment volumes than originally envisioned.
Overlay this technology with the complex networks that transport payments received in multiple formats, that then speed along a series of possible routes quickly stopping or changing direction depending on the result of credit, fraud and sanctions checks before exiting the bank again through a multitude of possible ways and suffice to say, these environments can make the M6’s spaghetti junction look like a simple country lane.
Unsurprisingly for the technical teams charged with making changes, these environments can present significant challenges. The basics of Newton’s laws of physics are that every action has an equal and opposite reaction – however how can you accurately predict what the reaction will be if you can’t really understand how everything in your environment interacts.
(Oh and by the way, those original developers who designed these systems back in the mid-80’s and 90’s are often happily spending their autumn years on a golf course someplace sunnier than the square mile – so not really all that available to address ad-hoc technical queries).
Before introducing changes banks really need to get to grips with exactly how their cats cradle of interconnected systems and processes are actually working together. The problem is that sometimes these organic environments have become so complex that understanding how all of the strings connect is often beyond the enterprise’s collective comprehension.
Whilst these firms often have system-by-system or process-by-process monitoring to help them understand what is happening, sometimes they have no means of gaining a complete end-to-end view of how everything is working. As such, at any one time they can only see a snapshot of what is happening in a particular location. Therefore, it’s really tough to accurately predict the exact impact that a software or system upgrade on one element may have on other piece of code, as interdependencies are difficult to depict.
Also these environments need to run all of the time, banks don’t have the luxury of taking down their payment environments for 6 months, rebuilding and robustly testing them. They just need to work continuously and as customer demands evolve somehow changes still need to be incorporated. Like during major motorway road works, the traffic still needs to flow.
If future outages are to be avoided and customers retained, these issues need to be overcome. Banks need to get much better at correctly forecasting how changes can have knock on effects and the first step for many is to be able to really understand how everything is actually working together.
To achieve this, banks need to gain the detailed oversight necessary to understand how all systems and networks are operating and how payments are flowing across these from their point-of-entry to final posting in real-time. And they need this in a single view not via 20 or 30 different screens.
With this level of understanding they can much more effectively predict how picking at a particular string in their cat’s cradle may impact others. Also, because teams are able to monitor everything that is happening end-to-end, as they incrementally improve the system should an issue occur they are instantly aware of it and where exactly it has occurred, rapidly reducing the time to repair. And quicker repairs means fewer angry customer tweets, calls and emails.
They can also then take another important step – they can start looking at opportunities to reduce complexity by rationalising systems. Fewer systems speaking to other systems, means fewer moving parts and things to go wrong. So keeping the show on the road without a hitch is much more likely. Reducing the time required to understand and repair issues ultimately also means less of that 80% of budget goes on keeping the old systems running and more on new developments that can help modernise and innovate.
It’s a well known fact that retaining an existing customer is much easier than trying to secure a new one, so let’s start providing technical teams with the level of insight they need to more effectively predict possible risks before the next big software upgrade creates a technical glitch causing even more lost customers and PR headaches.