How Hootsuite Slashed Infrastructure Costs by 40%
Discover how Hootsuite streamlined its infrastructure from 2,000+ EC2 instances to less than 500 servers
Ever felt like your tech infrastructure was a sprawling city, growing faster than you could manage? That's precisely where Hootsuite found itself not too long ago.
Hootsuite was built on AWS and had over 2,000 Amazon EC2 instances, which offered secure and resizable compute capacity. However, managing this extensive Amazon EC2 estate efficiently required new tools.
They needed to minimize their AWS footprint to reduce the attack surface and gain more consistency and standardization.
So, they shrunk it to less than 500 servers, slashed their costs by 40%, and boosted their system's reliability.
How? Well, it began with addressing the complexities of their microservices setup.
The Microservices dilemma
Hootsuite's journey began with the early adoption of microservices architecture. At first, it was great—teams could innovate faster, scale individual components, and easily deploy.
However, as Lei Guo, Hootsuite's senior director of platform and infrastructure puts it: "A microservices architecture presents many benefits but also creates new challenges."
Hootsuite was grappling with:
Service sprawl: The number of services expanded rapidly, each requiring individual management and oversight.
Dependency confusion: As services multiplied, understanding and managing inter-service dependencies became increasingly complex.
Troubleshooting nightmares: Troubleshooting and maintaining the system became more challenging, with potential issues spread across numerous services.
These challenges exemplify the "complexity vs. scalability" trade-off. While microservices enable fine-grained scalability, they increase systemic complexity.
To address these issues without losing the benefits of microservices, Hootsuite turned to containerization. This approach promised to bring order to their complex infrastructure while preserving flexibility.
Strategic approach to simplification using containerization
Containerization directly addressed the key challenges Hootsuite faced with its microservices architecture:
Containers provided a standardized way to package and manage services, simplifying the oversight of numerous microservices.
By encapsulating each service with its dependencies, containers reduced inter-service dependency issues and made it easier to understand how services interacted.
Containerization improved consistency across environments, making reproducing and isolating issues easier, thus simplifying the troubleshooting process.
Moreover, containerization aligns closely with the principle of Infrastructure as Code (IaC). It allowed the entire infrastructure to be defined, version-controlled, and automated through code.
For Hootsuite, adopting IaC meant:
Improved reproducibility: Infrastructure configurations could be easily replicated and tested.
Improved collaboration: Infrastructure changes could be reviewed and managed like any other code.
Increased automation: Deployment and scaling processes could be automated, reducing manual errors and improving efficiency.
Translating these theoretical benefits into practical improvements required them to do careful planning and execution.
From proof of concept to full-scale implementation
Hootsuite's adoption of containerization wasn't a hasty overhaul. Instead, it was a methodical, seven-year journey of gradual transformation, starting in 2016.
They started small, setting up a self-managed Kubernetes environment on AWS and experimenting with containerizing a handful of services.
This cautious start allowed them to test the waters without disrupting their entire system.
By 2018, Hootsuite saw an opportunity to level up with Amazon EKS. They began migrating services from their self-managed setup to Amazon EKS, one step at a time.
Their step-by-step migration process included:
Adopting a multi-cluster strategy across multiple AWS Availability Zones to maintain high availability and ensure that the Hootsuite dashboard remained operational even during outages or service degradations.
Gradually migrating services from their self-managed Kubernetes setup to EKS, minimizing risks to critical workloads.
Implementing Karpenter, an open-source auto-scaler, to optimize resource management within EKS.
Improving security measures using AWS services like WAF, Amazon Inspector, GuardDuty, and Security Hub.
Using advanced analytics with Amazon Redshift and real-time event streaming to improve customer insights.
Continuously optimizing their infrastructure even after the initial migration.
Lei Guo, Hootsuite's senior director of platform and infrastructure, explains the rationale: "To improve business agility and development productivity, you need a consistent environment to operate in. Without that consistency, people are going to spend more time trying to figure things out, which is going to incur overhead costs."
This gradual approach paid off. It allowed Hootsuite to:
Learn from each migration phase and refine their process
Minimize disruption to ongoing operations
Build expertise across their teams gradually
Identify and address potential issues before they became major problems
Collaboration was key throughout this process. Hootsuite worked closely with the Amazon EKS team, leveraging their expertise to optimize the new environment and consolidate their EC2 instances.
The results? By early 2022, Hootsuite had successfully migrated virtually all of its production services - over 700 different services - to Amazon EKS. They'd transformed their sprawling infrastructure into a streamlined, efficient system.
But the work continued. Hootsuite continues to fine-tune its setup, always looking for ways to optimize resource utilization and improve efficiency.
Rewards of consolidation
Hootsuite's strategic shift to containerization and managed services yielded impressive results:
Server reduction: The company reduced its Amazon EC2 footprint from over 2,000 instances to less than 500. It simplified management and reduced the overall complexity of their infrastructure.
Cost savings: The streamlined infrastructure translated directly to the bottom line. Hootsuite achieved a 40% reduction in Amazon EC2 costs.
Improved reliability: Hootsuite experienced a 57% reduction in their most severe incidents (SEV1 and SEV2), which previously caused system-wide outages or major functionality issues for customers.
Improved security: With a smaller, more manageable infrastructure footprint, Hootsuite strengthened its security posture. The company saw a 28% reduction in security incidents.
The new containerized architecture also fundamentally altered how development teams worked. Lei Guo explained, "With Kubernetes running on Amazon EKS, we can bring all of our microservices into one environment and operate as one. This is a huge benefit to all of our teams and service owners."
Learn more about it here.
Here are some of the insightful editions you may have missed: