How we cut infrastructure change time by 80% for mission-critical data platform?
- Scope: Infrastructure Modernisation
- Technologies: Terraform, Azure Cloud
Mission-critical systems tend to be complex and brittle—any significant change to the software or infrastructure must have a high investment return to justify inherent risks in the evolution of legacy IT premises.
Fortunately, that was the case in migrating data platform project to Infrastructure as Code solution for CRU Group.
CRU Group partnered with Exlabs to migrate three cloud environments for one of their critical software systems. The goal was to enable a self-serve model for infrastructure provisioning with better governance and change management.
Our outcomes in the project match other industry leaders. We saved 80% time required for infrastructure modification, uncovered (and fixed) a few security misconfigurations and scaled down overprovisioned components, further reducing the cloud bills.
The client was using Azure public cloud infrastructure for one of its products. A complex stakeholder environment required hosting the project in a few distinct copies – ranging from pure development environments to production-ready services. A separate Azure subscription managed each of these.
By design, the environments were to be nearly identical copies. Such setup has been selected as a pragmatic way to increase agility, reduce maintenance costs and simplify security management.
Unfortunately, the unstructured and manual implementation of changes over the years by different teams at different times has led to inconsistencies between environments. It had a direct impact on the increased complexity of managing infrastructure.
New requirements kept pressing on delivering additional services, putting strain on limited time and resources at the client’s disposal.
Process and Security
The delivery process has been broken down into three phases. The first step involved evaluation of security and resiliency. It enabled us to closely monitor infrastructure to ensure no additional security misconfigurations have been introduced in the process.
Inspection revealed a set of networking shortcomings that were promptly adjusted to ensure data security. Additional security services have been put in place to safeguard access to critical resources.
Our client, as a a leading provider of analytics in the commodities industry, required the application to be available under high load and remain stable in the event of service disruption. To provide a seamless user experience we advised a region failover scenario to be implemented. With the help of chaos engineering best practices we were able to test it out thoroughly and confirm system responsiveness.
You can boost your system’s reliability and scalability with an active-active solution. By distributing workload across multiple instances, you’ll achieve high availability and resilience, ensuring uninterrupted service even during peak times or failures.
Second step focused on reducing configuration drift between environments. Our aim was to standardize building blocks of the infrastructure, and parameterize those blocks to cater to specific needs of each environment (e.g. optimising production environments for high availability and resilience).
By tailoring the infrastructure configuration to the specific requirements of each environment, you can avoid over-provisioning and ensure that you are only paying for the resources you actually need.
With infrastructure being properly audited, and configuration drift resolved, we migrated the rest of the infrastructure to Terraform configuration files. Every change in the underlying infrastructure can now be performed on a self-serve basis with additional auditing and review steps.
The team is able to accurately predict the effect of their changes, and all infrastructure updates are tightly connected and documented with business requirements they solve, bringing additional level of accountability.
Transforming the existing mature architecture to IaC was a challenging task due to the number of resources deployed on the Azure Cloud. However, it brought significant benefits in terms of cost savings and increased productivity. During the migration, it was discovered that approximately 14% of the resources were unused, which allowed for further optimization and cost reduction
The whole operation was successful. We achieved a significant 80% reduction in architecture change time, resulting in substantial cost savings for building and modifying environments.
These efforts have enabled us to achieve scalability and cost efficiency, providing a solid foundation for effective application management.
Additionally, we created documentation of the system that proved to be of great value to the existing team. It also enabled a more effective onboarding process with a better system understanding.
– That’s quite an extra bonus, don’t you think?
CRU is the leading provider of analysis, prices and consulting in the mining, metals and fertilizer markets. CRU offers unrivalled business intelligence on the global metals, mining and fertilizer industries through market analysis, price assessments, consultancy and events.