Architecting Success in Public Cloud - The Critical Building Blocks
This blog post is based on my field experience while at Amazon Web Services, case-studies, and on my own research (Taleb, Flyvbjerg, Blood and others).
“General principle: the solutions (on balance) need to be simpler than the problems.” - Nassim Nicholas Taleb
In the realm of public cloud adoption, we observe a fascinating pattern: organisations often rush to migrate workloads before establishing proper foundations, creating what we might call "cloud debt" - technical and operational burdens that compound over time.
Understanding and implementing the fundamental building blocks of cloud architecture isn't just good practice; it's the difference between thriving and merely surviving in the cloud.
Key Takeaways
A well-architected landing zone is not just an implementation detail but a strategic asset that enables scalable, secure, and efficient cloud operations. Organisations must invest time in designing and implementing this foundation before significant workload migration.
Security and governance must be built into the foundation, not added as an afterthought. This "shift-left" approach to security creates what we might call "security by design" - where protection is an inherent property of the system rather than an external control.
Infrastructure as Code transforms cloud management from a reactive to a proactive discipline. When properly implemented, it creates a self-documenting, version-controlled, and reproducible infrastructure that reduces risk and increases operational efficiency.
Global network architecture requires thinking beyond traditional connectivity patterns to create resilient, high performing, and secure communication fabrics that can adapt to changing business needs.
Question vendor-promoted methodologies that prioritise standardisation over your unique business context.
Avoid technical debt by balancing short-term functionality with long-term sustainability.
The Foundation Challenge
The empirical data tells a compelling story - while cloud adoption continues to accelerate, research shows that a majority of organisations exceed their cloud budgets, and a significant number face unexpected operational challenges post-migration. The root cause often traces back to insufficient attention to foundational elements - what we call the "architectural prerequisites" of successful cloud adoption.
The key challenges typically manifest in several areas:
Unstructured Growth: Without a proper landing zone, cloud environments grow organically, creating governance and security challenges that become increasingly difficult to address.
Network Complexity: Global cloud networks, when not properly architected from the start, create performance bottlenecks and security vulnerabilities that can cripple operations.
Configuration Drift: Manual configuration leads to inconsistencies across environments, creating what we might call "cloud entropy" - a gradual descent into chaos.
Security Gaps: Reactive security measures, implemented after deployment, often leave critical vulnerabilities and compliance gaps.
But before going deeper into the discussion, let me first introduce you to a friend and his methodologies.
Hammurabi Risk Management
Hammurabi, the sixth king of Babylon (c. 1792-1750 BCE), created one of history's earliest legal codes. His approach to risk management was revolutionary - especially regarding information asymmetry between builders and occupants. His code stated:
"If a builder builds a house and it collapses, killing the owner, the builder shall be put to death." - Hammurabi Code
This harsh penalty addressed the agency problem - where builders (agents) had more knowledge about construction quality than clients (principals).
The casual reader may understand this as “an eye for an eye”, but the true message is much more sophisticated. It deals with the invisible transfer of risk, and removes a free option from the agent at the expense of the client (see Taleb for a much more in depth discussion).
Now that we have been introduced to this agency problem and to the asymmetry of knowledge, we can discuss how to build a Public Cloud foundation with an enriched perspective.
The Building Blocks of Success
Let's examine each crucial building block and understand its role in creating a robust cloud foundation, and typical agency problems to be aware of when discussing with vendors and consultants. Next time you engage with them, ask them these questions, and notice what they say or how they act. And ask them again midway through the project.
Landing Zone
The Foundation of Foundations: A landing zone is more than just an initial setup - it's the embodiment of your organisation's cloud operating model. The key is creating what we might call "structured flexibility" - enough standardisation to ensure governance while maintaining the agility cloud promises. Think of it as city planning for your cloud environment.
Aspects covered by this critical component:
Account structure and organisation
Foundational governance framework implementing the organisation’s operating model across teams
Identity and access management baseline
Security and compliance frameworks
Resource hierarchy and tagging strategies
Cost management foundations
“Hammurabi Risk Management” questions to ask:
Describe a scenario where your Landing Zone had to accommodate an unforeseen regulatory change. What architectural decisions helped or hindered adaptation?
How would your Landing Zone design facilitate rapid workload isolation during a zero-day vulnerability without disrupting the entire environment?
Identify three design decisions that might become constraints if our traffic patterns or data sovereignty requirements change dramatically, and your mitigation strategies.
What percentage of your previous Landing Zone implementations required significant architectural revisions within the first year, what patterns emerged, and how have these lessons influenced your current proposal?
How would your tagging strategy and resource hierarchy accommodate a 500% growth spike in a business unit with unique compliance requirements without requiring fundamental reorganisation?
Security and Governance
The Protection Framework. Security in the cloud requires a shift from perimeter-based thinking to what we might call "omnipresent security".
Aspects covered by this critical component:
Zero-trust architecture
Continuous compliance monitoring
Automated security controls
Data protection frameworks
Identity and Tag based security
“Hammurabi Risk Management” questions to ask:
Can you describe a case where a client's zero-trust implementation failed despite following your design? What specific improvements have you made to address these vulnerabilities?
How does your compliance monitoring differentiate between technical compliance and actual security? What prevents attackers from working around your automated controls?
What happens if security tags or identity metadata become corrupted in your system? How do you detect and recover from this type of compromise?
If sensitive data appeared in unexpected places like logs or caches, how would your protection framework detect this before it becomes a compliance issue?
Global Cloud Networks
The Digital Nervous System - Network architecture in the cloud requires thinking differently about connectivity.
Aspects covered by this critical component:
Global transit networks
Software-defined perimeters
Hybrid connectivity
Performance optimisation
Security segmentation
“Hammurabi Risk Management” questions to ask:
Describe a client scenario where your network design failed during an unexpected regional outage. What created bottlenecks or single points of failure, and how has your approach evolved?
How would your architecture adapt to a fundamental shift in attack vectors that bypassed traditional segmentation methods? Which components would need reconfiguration or replacement?
Identify non-obvious ways regional failures might cascade through your proposed architecture despite apparent isolation, and what design elements mitigate these risks.
What limitations in your hybrid connectivity design might become apparent if we needed to rapidly shift 80% of workloads to a previously unused region?
How would your network architecture adapt to support edge computing with thousands of low-latency endpoints, AI/ML workloads with massive east-west traffic?
Infrastructure as Code (IaC)
The Automation Imperative. This approach creates what we might call "infrastructure certainty" - the ability to predictably and reliably deploy and manage cloud resources. IaC transforms infrastructure management from an art to a science.
Aspects covered by this critical component:
Version-controlled infrastructure
Repeatable deployments
Configuration consistency
Automated compliance checks
Disaster recovery capabilities
“Hammurabi Risk Management” questions to ask:
Could you share an example where your IaC approach didn't handle a cloud service update well for a client? How have you modified your templates to prevent similar issues for us?
What happens in your system when a deployment partially fails? Please walk us through your detection and recovery processes for these inconsistent states.
Tell us about a time when your compliance automation missed something important. How has your framework evolved to catch these unexpected issues?
For which scenarios would rolling back infrastructure changes be risky or impossible with your approach? How does your solution handle these situations?
How would your IaC implementation adapt if our company merged or restructured? Which specific design elements ensure your templates remain valuable during organisational change?
“Modularity is a clunky word for the elegant idea of big things made from small things. A block of Lego is a small thing, but by assembling more than nine thousand of them, you can build one of the biggest sets Lego makes, a scale model of the Colosseum in Rome. That’s modularity.” - Bent Flyvbjerg
Beware the vendor’s frameworks
A final word of caution covers the use of “vendor recommended frameworks”. The agency problem addressed in this essay persists today in public cloud projects. When cloud service providers or consultants design complex (and fragile) architectures like Landing Zone Accelerator (LZA) or Serverless Transit Network Orchestrator (STNO), they act as agents with more technical knowledge than their clients. These complex solutions often:
Create unnecessary dependencies
Increase maintenance costs
Reduce organisational agility
Entrench specific vendor relationships
Introduce unwarranted fragilities which are only seen during operation
Just as Hammurabi's code forced builders to share risk with occupants, modern cloud governance requires mechanisms that align incentives between technical implementers and business stakeholders.
Simple, maintainable architectures with clear accountability help mitigate these agency problems, ensuring technical decisions serve business needs rather than provider interests.
“[The fragilista] defaults to thinking that what he doesn’t see is not there, or what he does not understand does not exist. At the core, he tends to mistake the unknown for the non-existent.” - Nassim Nicholas Taleb
I invite you to subscribe to my blog, and to read a few of my favourite case-studies describing how some of my clients achieved success in their high-stakes technology projects, using the very same approach I just described.
Have a great day!
João