Shop Performance During Black Friday From the Administrator’s Point of View
This year’s Black Week is over. At Unity Group, we have another reason to be proud – this year our customers’ stores have again risen to the challenge, handling a total of over 67,000 orders during Black Friday!
When summing up promotional campaigns and profits, remember to account not only for the perspective of sellers, who had to both develop a competitive offer and skilfully conduct promotional campaigns, but also that of the administrators who managed the efficiency of the online store.
For the team of administrators at Unity Group managing servers for many B2C e-commerce platforms, our intensive work began much earlier – weeks before Black Friday we were already prepared for increased shopping traffic from our customers. Here are a few words about the process of preparing for the sales.
Introduction: Setting expectations is crucial
We analysed last year’s Black Friday traffic and then asked teams and customers about their expectations for traffic in their online stores this year. Are any additional marketing campaigns being planned? Are there any promotional activities that may drive sudden peaks in traffic on the website (e.g. TV advertising, influencer recommendations)? How much traffic do they expect during Black Friday?
The answers to questions like these allowed us to define the scale of activities. From our experience we know that websites can experience traffic several times greater than on a typical day. To make sure we can handle it, there are several steps that need to be taken.
To meet expected increased data processing needs in all of our websites, we performed intensive routine work including:
- Verification of free disk space and preventative enlargement where we expect it to be needed.
- Verification of free CPU and memory resources and vertical scaling to keep a backup.
- Optimization of databases, indexes and queries, recovery of space occupied by deleted records (Database Vacuuming).
- Manually running additional servers for architectures that do not have automatic scaling.
- For autoscalable architectures – autoscaling testing, verification of indicators triggering autoscaling.
- Verification of low-level parameters like IOPS for disk operations – is it sufficient? Should the disks be replaced with faster ones? Checking how processor cores are used – will the environments work better when they have faster or slower cores?
Initial database optimization was performed with tools logging queries lasting above a specific number of milliseconds (e.g. pgbadger, mysql slow log). We did this to optimize the most complex and query-overloaded servers. In the second step, using tools registering all queries (e.g. pg_stat_stat_statements), we checked the total time spent by the server handling queries of a specific type. Often you can find database queries with an execution time of hundredths of a second, but the execution of such queries thousands of times per second means speeding up their operation even by just a few milliseconds can release valuable processor cycles. Such optimization may involve both changing the query structure and adding appropriate indexes. Often the benefit is not visible during query execution, but in reducing the number of disk operations – leaving more resources for other queries.
An additional step in database optimization was to verify if there are any redundant, unused indexes. In intensively developed applications, sometimes such indexes are added too high, sometimes the database engine changes the so-called query plan using other indexes added for the purpose of new queries. Regardless of whether the index is used or not, it affects the performance of all write operations, which are an integral part of any ordering process.
Selected sites – especially those that were supposed to survive Black Friday for the first time – were subjected to performance tests. Using appropriately constructed profiles in gatling and jMeter applications, we simulated the movement of an ordinary user browsing the site, adding products to the basket or processing an order. Then we increased the number of simultaneous simulated users to values reaching and exceeding the assumed traffic. All this to verify whether the environment is capable of handling it, and then to determine what the performance limit is and where the bottlenecks are.
Knowledge of potential bottlenecks is very useful because it allows you to perform optimization, increase resources, prepare scaling or simply show what elements need careful observation.
These preparatory activities gave us confidence that we could ensure continuity of services during Black Friday in relation to the assumed traffic volumes, as well as knowledge of how to proceed if these assumptions are exceeded.
The websites, apart from standard monitoring by administrators on duty, were additionally inspected by the administrators supervising the projects. We analysed traffic on websites, comparing it with assumptions, and we checked how traffic translates into resource load.
Monitoring was carried out on many levels:
- Active detectors regularly examined website fronts, e.g. by downloading the homepage and selected subpages, checking the response time of the server and the received content. In addition, we verified the performance of each component, starting from the web server, through databases, cache layers and ending with the operation of the physical components themselves inside servers. Any deviation from the norm resulted in immediate notification of administrators waiting to react.
- Continuous passive monitoring showed key general metrics such as the number of requests per minute on websites, and detailed metrics such as loads on components, especially those that formed bottlenecks. The information provided by this monitoring made it possible to uncover situations that could lead to outages and take action before the failure occurred.
These actions translated into measurable effects:
- Only one store worked for a short time at reduced efficiency, just after the launch of an advertising campaign, which generated traffic significantly exceeding the expected volume, even before the environments managed to scale. The situation was immediately resolved, and the website was scaled up in a preventive manner with a correspondingly larger reserve.
- All other services under the care of the Unity Group team of administrators made it through Black Friday with flying colours, without recording any failures despite record numbers of visitors and conversions.
- The stores of our customers serviced tens of thousands of orders in one day and were prepared for record numbers of visitors.
The Unity administration team takes the reliability and performance of our customers’ systems very seriously. We are aware of the importance of full capacity and performance on key sales days, which is why we prepare it in advance – this translates into customer satisfaction.
This is confirmed in reports submitted by our customers at the end of the day crowning weeks of preparation:
The system is robust. The number of orders is a new record, maximum Unique Users at the same time. Thank you for the optimization!