The end of 2012 is fast approaching and our payment systems are soon to enter their annual freeze period. Many of us are looking forward to celebrating holidays at home with our families, or to just getting some rest out of the office and away from our daily work challenges. Now is a good time to do what we can to ensure that we can spend time some time away from our office, uninterrupted by unplanned work emergencies.
There are several common glitches that processing systems encounter during the holiday period. While these glitches may present themselves differently, they are usually caused by the same issue, i.e. overall higher processing traffic. Most sensitive to this is obviously online transaction processing itself. Monitoring transaction loads for one local merchant revealed that the number of all transactions before and during the holiday period may easily reach 200% of average daily processing. An increase of 200% may not seem like too much and indeed the majority of payment switches are built with these seasonal peaks in mind. What is not necessarily planned for is that the 200% mentioned is a daily value, meaning that peak processing might be even 4x higher than usual. It is also worth noting that payment systems are dynamic environments and new processing projects added to the switch over the last year may mean the difference between handling seasonal volume and not handling it.
For example, platforms with an average yearly processing around 5 TPS (day & night) can jump up to 40-50 TPS in holiday peak times, which might not leave sufficient capacity to handle the platform's other processing activities. Another important value is the message turnaround time. While poor turnaround time isn't the same as a system outage, some transactions will experience unacceptable delays and possibly time out completely. And everybody knows that keeping transaction latency very low is one of the keystones of our business.
Back-office processing, though not directly part of the online authorization, is tightly coupled to it and needs to be watched very closely as well. It is better to know there is ample space on your hard-drives for all extracts and reports, as they may easily multiply their size over these heavily burdened days. It is a good idea to think ahead and consider establishing a buffer for re-running failed back-office processing and giving sufficient time to your support staff to remedy any issues. Having a spacious processing schedule for all the settlement processes will benefit the whole system and will result in all reports and extracts being processed on time, even through these difficult periods.
The network is another important component of day-to-day processing. Networking capacities are usually well dimensioned and even higher transaction loads will not produce any degradation in performance. But the situation changes when multiple factors impact the network at the same time. Consider the situation where a recent architectural change to the cluster introduces near real-time database replication to a DR site, while at the same time a weekly backup of a full database over the same network is triggered. If your network is not already proven to handle such scenarios, perhaps it is a good time to check with your network technician that your production work infrastructure has enough bandwidth available for these eventualities.
Your server hardware is closely integrated with networking and applications, and is often serviced by the same team. While unlikely, it may happen that hard-drives fail and subsequently, dependent RAID mirrors are marked as corrupted. Even though this is not likely to occur, the consequences could be significant and immediate replacement with a hot-spare and mirror re-build might be needed. In these cases it is much better to have this equipment to hand (which requires only a minimal capital outlay), as well as knowing that your server manufacturer's best support person will be available to come to site over the holiday period, if it were to be necessary.
In anticipation of the holidays, EFTlab prepared the following check-list divided into three main groups :Processing, Back-office and Hardware. The list is intended to provide valuable guidance for keeping processing platforms running smoothly, allowing you to enjoy the holidays with a clear and rested mind.
- Are monitoring alerts set properly and enabled so that any processing issues such as network disconnects, frequent connections and high transaction latency will be detected on time?
- Are all transaction database and application data loads set to carry peak processing times (7:00 - 10:00, 11:00 - 14:00 and 16:00 - 19:00)?
- Are all routine database jobs such as maintenance and full backups disabled or scheduled outside of peak times?
- Are all database cleaners and management jobs completed, making data storage ready for higher loads?
- Did all database and transaction cleaners end successfully?
- Is at least 50% of hard drive space dedicated to the production database?
- Is the amount of memory allocated for the processing application stable from a long-term perspective or growing?
- Is the available hardware security module (HSM) powerful enough to handle the increased security load from the security protocol implemented last summer?
- Is the DR HSM available over the network and loaded with production keys?
- Are all security key-component holders available even over the holiday period?
- Were all database cleaners and management jobs run beforehand, making data storage ready for higher loads?
- Are all resource hungry database jobs such as maintenance and full backups disabled or scheduled outside of peak times?
- Is the schedule for running extracts and reports too tight?
- Is the amount of memory allocated for the reporting application stable from a long-term perspective or growing?
- Is there at least 30% of space left on the reports/extract output hard-drive?
- Is there at least 30% of space left on the database hard-drive?
Hardware and Networking
- Is there separate network bandwidth dedicated for downstream and upstream production processing?
- Were all backup tapes new and loaded?
- Will all the production software licenses be valid on the 1st of January (VMware, WinZip, AntiVirus, TrueCrypt, ConnectDirect...)?
- Are all security certificates valid and will they be available even on the 1st of January?
- Are additional hot-spares and RAM modules available locally and is they key to the safe accessible?
- Is there enough space on production servers for local backups?
- Does the VMware have enough HDD space?
- Have you considered allocating spare VMware resources that won't be needed over the holiday period to the production image?
Many of the aforementioned steps don't involve any additional work for your teams and knowing the answers might give you an advantage when the time comes. We at EFTlab hope that our checklist will help you be well prepared for the holidays and serve as a reminder that we are here to help test your systems to ensure that they are up to the challenge of business. There is nothing more rewarding for us than seeing our solutions show that your platform is rock solid.