Simplifying Infrastructure - Why your Terraform CI/CD is fighting you (Part 1)
How Fat Tony would spot the tool vendor complication before it costs you six figures
Socrates is about knowledge. Not Fat Tony, who has no idea what it is. For Tony, the distinction in life isn’t True or False, but rather sucker or non-sucker. Things are always simpler with him.
— Nassim Nicholas Taleb
Fat Tony walks into the engineering office, sees three whiteboards covered in diagrams of Atlantis web-hooks, Terragrunt “include” hierarchies, and dependency graphs that look like a Jackson Pollock painting. He squints at the mess and says:
"Kid, you're making this harder than explaining credit default swaps to my barber. You got boxes to deploy? Deploy the boxes. But when your fancy tools break - and they will - make sure you can fix them without calling a consultant."
This four-part series describes a structured way of building Terraform CI/CD for complex shared infrastructure, keeping things as simple as possible:
Part 1 (current post) exposes why your current tooling is probably fighting against you;
Part 2 reveals how I use plain GitLab CI and GitHub Actions to do complex infrastructure deployments,;
Part 3 looks into branching strategies, and explores my personal preference for GitLab Flow for complex infrastructure, while allowing developer teams to use whichever strategy they prefer;
Part 4 shows you the functional composed-repository pattern that scales from startup to enterprise without the complexity death spiral;
By the end of this series, you'll understand a simpler way to build infrastructure deployment pipelines that your team can actually maintain, scale without exponential complexity growth, and debug when things inevitably go sideways at 3 AM. More importantly, you'll see why choosing boring, reliable technology for your infrastructure foundation is a logical thing you can do in a world obsessed with the latest framework du jour.
Main Insights
Tool complexity is a hidden tax: Every abstraction layer adds cognitive load and debugging overhead to your team;
Cognitive load kills velocity: Senior engineers debugging tool configurations aren't solving business problems;
Specialised tools create lock-in: The more specific the tool, the harder it becomes to escape when requirements change;
Simplicity scales better: Boring technology that everyone understands beats sophisticated tools that only experts can debug;
Robust beats optimised: Systems that are indifferent or get stronger under stress outperform systems optimised for perfect conditions;
The complexity trap: When tools become masters
Walk into any serious enterprise engineering team today, and you'll witness what Nassim Taleb calls "intellectualisation" - the tendency to make simple problems complex through the application of sophisticated-sounding solutions. Teams are drowning in YAML configuration files for Atlantis, struggling with Terragrunt's DRY principles that make your infrastructure become monolithic, and debugging dependency graphs that would make a systems theorist weep.
The cognitive load is crushing. Your senior engineers - the ones you're paying six figures to solve business problems - are instead debugging why Terragrunt can't figure out the dependency between your landing zone and your global network stack. Meanwhile, your actual business requirements sit in a backlog, accumulating technical debt like compound interest working against you.
And here's the kicker: these tools promise to solve complexity by adding more complexity. It's like curing a headache by hitting your thumb with a hammer - technically, you're no longer thinking about your head.
The hidden economics of tool adoption
Let's talk about what this costs in engineer-hours. A senior infrastructure engineer costs roughly $200-300 per hour when you factor in salary, benefits, and overhead. Every hour spent debugging Atlantis web-hook or mis-configuration failures or Terragrunt dependency resolution (and how Terraform caught up) is $200-300 not spent on business value.
I've seen teams spend weeks getting their Terragrunt configuration "just right," only to have it break when they need to add a new region or account. The same infrastructure deployed with native Terraform and simple CI/CD takes days, not weeks, and the mental model is simple enough that junior engineers can contribute meaningfully.
The tool vendors and communities don't mention this in their marketing materials, but the total cost of ownership for specialised infrastructure tools follows a brutal curve. Initial adoption seems cheap - maybe a few days of setup. But maintenance, debugging, and the inevitable customisations compound over time. What started as a time-saver becomes a time-sink that your team can't escape.
The Atlantis seduction
Atlantis is particularly seductive because it promises to solve real problems: collaborative Terraform workflows, pull request-based deployments, and automated planning. These are legitimate infrastructure challenges that every team faces.
But here's the other side of the coin: Atlantis adds a single point of failure to your infrastructure deployment process. When Atlantis is down, your infrastructure deployments are down. When Atlantis web-hooks fail, your pull requests sit in limbo. When you need to customise Atlantis behaviour, be prepared to become an expert on the internals of the tool.
The web-hook based architecture sounds elegant until you're debugging why GitHub web-hook delivery failed, or why Atlantis didn't receive the payload, or why the web-hook arrived but Atlantis couldn't parse it. Suddenly, your infrastructure deployment debugging requires understanding three different systems: your Terraform code, your Git hosting provider's web-hook system, and Atlantis's internal management.
Compare this to a simple CI/CD pipeline. When a deployment fails, you look at the pipeline logs. When you need to customise behaviour, you modify your workflow configuration file. When you need to debug web-hook delivery, you don't - because there aren't any web-hooks.
The Terragrunt tangle
Terragrunt promises DRY (Don't Repeat Yourself) infrastructure code, and for simple use cases, it delivers. The ability to share configuration across environments is genuinely useful. But Terragrunt's approach to DRY creates a different kind of complexity: configuration that's spread across multiple files which include relationships that become impossible to trace.
When something breaks in a Terragrunt setup, the debugging process requires understanding:
The root terragrunt.hcl configuration;
Any parent terragrunt.hcl files in the directory hierarchy;
The specific terragrunt.hcl file for the failing module;
The underlying Terraform module being called;
Any generate blocks that modify the Terraform configuration at runtime;
The dependency relationships between different Terragrunt configurations;
This is cognitive overhead that grows exponentially with the complexity of your infrastructure. What started as a way to reduce repetition becomes a web of inter-dependencies that only the original author can understand.
The obsolescence risk
Sustainable systems outlast fashionable systems. Terragrunt, Atlantis, and similar tools are built by small teams with specific opinions about how infrastructure should be managed. When those teams pivot, get acquired, or simply burn out, your infrastructure deployment process is held hostage.
Native Terraform with simple CI/CD pipelines (e.g. GitLab CI, GitHub Actions) is built on foundational technologies that aren't going anywhere. Terraform is HashiCorp's core product, or you can look into OpenTofu which is backed by a massive community and enterprise customers who depend on its stability. GitLab CI/CD is fundamental to GitLab's business model, GitHub Actions is fundamental to GitHub’s business model - it can be said that both are used by virtually every company out there.
When you build on these foundational technologies, you're betting on systems that have too much momentum and presence to disappear. When you build on specialised tools, you're betting on the continued interest and capability of much smaller teams, which contribute to a highly fragmented landscape.
What you're avoiding with a simpler approach:
The Atlantis Trap: No web-hook configurations to debug, no server to maintain, no custom Domain Specific Language (DSL) to learn. Your CI/CD is just CI/CD, using the same patterns your application teams already understand.
The Terragrunt Tangle: No DRY taken to the point of complexity, no generated blocks that modify your code at runtime, no “include” statements that require archaeological skills to trace.
The Multi-Tool Madness: No orchestration between different tools, no JSON configuration files for your YAML configuration files, no debugging across multiple abstraction layers.
The Knowledge Silo: No specialised knowledge that only one team member understands, no custom frameworks that can't be debugged by anyone who wasn't there during the initial setup.
Complex CI/CD tool-chains with specialised integrations become fragile—they break in ways you don't expect, at times you can't predict, with error messages that require deep knowledge of the tool's internals to debug.
When something breaks, it better break in obvious ways with obvious solutions. Your GitHub / GitLab pipeline fails? Check the logs. Your Terraform “apply” fails? Look at the error message. Your dependency order is wrong? Fix the stage order in your pipeline configuration (feature in GitLab, requires workarounds in GitHub). No custom DSLs to debug. No web-hook configurations to troubleshoot. No dependency resolution algorithms to understand. No server infrastructure to maintain.
The path forward
The solution isn't to add more tools to solve the problems created by your existing tools. The solution is to step back and ask: what are we actually trying to accomplish, and what's the simplest way to accomplish it reliably?
Most infrastructure deployment requirements can be satisfied with:
Native Terraform / OpenTofu for infrastructure definition;
GitHub or GitLab CI/CD for orchestration and execution;
Simple IAM role architecture for security;
Functional repository organisation for maintainability;
This isn't about being anti-innovation. It's about being pro-reliability. Infrastructure is the foundation everything else depends on. Build it like you're building the foundation of a skyscraper - solid, predictable, and boring in the best possible way.
If this resonates with your infrastructure challenges - or if you're tired of debugging complex integrations of CI/CD tools instead of building actual infrastructure - the next part of this series will show you exactly how to escape the complexity trap.
Next up: Part 2 reveals how to easily deploy and maintain shared complex infrastructure using GitHub Actions & GitLab CI , and why (in my view) GitLab's workflow stages naturally handle complex dependencies better than GitHub's native workflow approach.
I invite you to subscribe to my blog, and to read a few of my favourite case-studies describing how some of my clients achieved success in their high-stakes technology projects, using the very same approach described.
Have a great day!
João


