Cloud Infrastructure

How Cloud Setups Break

AWS rarely breaks all at once.
It drifts.

One AWS account. Fast deployments. Console clicks. Then engineers join, environments multiply, and ownership blurs. Here’s the timeline most teams don’t realize they’re on:

Day 1 · Month 0–3

The honeymoon phase

One AWS account. Fast deployments. Console clicks feel harmless because the whole environment fits in one person’s head. Production, staging, and dev share resources — and nobody minds.

Month 6 · Year 1

The silent sprawl

More engineers join. New environments appear. Ownership blurs. IAM exceptions accumulate, spend creeps upward, and production starts depending on tribal knowledge. The bill climbs faster than usage.

Year 1+ · The wall

The complexity wall

Security findings pile up. Audits stall. Velocity collapses. Teams slow down because nobody trusts the platform anymore. You stop building cleanly and start patching around a foundation that was never properly laid.

“By Year 2, half my engineering team was firefighting infra instead of building product. That’s the real cost of letting drift run.“

VP Engineering · 60-engineer Series B SaaS

This wall is preventable.

The right foundation, set up correctly, prevents 90% of this drift. The rest gets caught by guardrails before it becomes a year-long cleanup project.

See how we build it

What Goes Wrong

Five problems that
don’t fix themselves.

These are the specific failure modes we see most. Each one quietly compounds until it becomes the only thing your engineers can work on.

01 · The compounding problem

One mistake takes down everything.

Without account separation, a bug in dev can break production. A security breach in one workload affects every other workload. A misconfigured IAM policy exposes the whole org. The blast radius is the entire company — and most teams don’t realize this until the first incident.

Incident · 14h recovery

“A junior engineer ran a Terraform destroy in what they thought was staging. It was production. Recovery took 14 hours. That’s when we called Cloudico.”

02

You can’t see where the money goes.

All your costs are mixed together. You can’t track spend by team, project, or environment. The CFO asks “why is the bill up 40%” and the answer is “we don’t know yet.” Finding actual cost-saving opportunities takes weeks of forensic work, every time.

03

You’ll hit AWS service limits.

Single accounts run into hard service quotas. What worked at 10 resources breaks at 100. Suddenly you can’t spin up another RDS instance, or your Lambda concurrency caps out, or your S3 buckets max out. The fix is multi-account — but only if you set it up right.

04

Compliance becomes a nightmare.

Different workloads need different security policies, but you’re stuck with one-size-fits-all. SOC 2 and HIPAA auditors ask for evidence you don’t have. You scramble for 3 months building paper trails retroactively. Every audit cycle gets harder, never easier.

05

IAM permissions are a tangled web.

Managing who can access what becomes incredibly complex. You either give too much access (risky) or too little (blocking your team’s work). New engineers wait days for the right permissions. Offboarding takes weeks. The principle of least privilege is theoretical, never enforced.

There Are Three Paths

Build it yourself, use Control Tower, or get it right.

Most teams reach this fork. Here’s the honest comparison — not a sales chart.

DIY in-house

AWS Control Tower

Cloudico

Time to production

3–6 months

1–2 months + setup

4–8 weeks fixed

100% Infrastructure as Code

Depends on team

ClickOps required

Terraform / CDK

CIS / SOC 2 compliance day 1

Build yourself

Partial baselines

100% CIS baked in

Senior engineer ownership

Internal hire(s)

No engineer included

One named senior

Knowledge transfer to your team

Tribal & partial

Docs only

Pair + walkthrough

Vendor lock-in

None

AWS-only patterns

You own all code

Cost

$60k+ in eng time

$500/mo + hidden cost

$18k+ fixed

Risk of misconfiguration

High — first time

Medium

Low — proven patterns

What We Build

Six capability blocks.
All shipped.

Every engagement covers the same six areas. The depth varies based on your scope, but nothing on this list is optional.

The full stack

Everything you need to actually run production.

Six capability areas. Same baseline on every engagement. AWS Organizations at the top, Terraform modules at the bottom, GitHub Actions and observability wired through the middle. You’ll get a complete, working production environment your team can ship into from day one.

AWS OrganizationsTerraformEKSGitHub ActionsKarpenterDatadog

live · production

Multi-account architecture

AWS Organizations with isolated production, staging, dev, and security audit accounts. Each one limited in blast radius, fully separated from the others, with cross-account access via role assumption only.

AWS OrganizationsSCPsCross-account IAM

Everything as Terraform

100% Infrastructure as Code. Modular Terraform with reusable modules per environment. Version-controlled in your repo. PR-reviewed. Tested in CI before plan/apply. No console clicks except for the bootstrap.

TerraformModulesGitOps

Kubernetes done right

EKS / GKE / AKS clusters with autoscaling configured for real traffic patterns. Cluster-autoscaler or Karpenter, proper requests/limits, network policies, secrets management, and ingress controllers tuned for production.

EKS / GKE / AKSKarpenterNetwork Policies

CI/CD with safe rollback

GitHub Actions or GitLab CI pipelines that build immutable artifacts, promote the same image across dev → staging → prod, and support one-click rollback. Branch protection, required reviews, and OIDC auth — no long-lived secrets anywhere.

GitHub ActionsOIDCArgoCD / Flux

Security baked in

100% CIS-benchmark compliant from day one. GuardDuty, CloudTrail, Config, and Security Hub configured across all accounts. Encrypted by default everywhere. Least-privilege IAM. SOC 2 / HIPAA evidence trails set up so audits stop being painful.

CIS 100%GuardDutySecurity Hub

Cost visibility & guardrails

Costs allocated by team, service, environment, and customer (where applicable). Budget alerts and anomaly detection on every account. Reserved Instances and Savings Plans where they pay back — no over-commitment. Drift detection so usage never silently runs away from you.

Cost ExplorerAnomaly DetectionRIs / Savings Plans

Every Feature, Listed

No fine print. This is what ships.

Every Cloud Infrastructure engagement ships with the same baseline. Larger scopes add to this list — nothing on it is ever removed.

Code editor showing Terraform infrastructure as code

39 features · one engagement

Every feature lives in your repo, not ours.

Every checkbox below is a Terraform module, a GitHub Action, or a security policy we hand you on day one. No black boxes, no proprietary wrappers, no vendor lock-in.

39

features shipped

100%

as IaC

0

proprietary tools

Security & Compliance

12 features · CIS-aligned

Centralized GuardDuty across every account, delegated admin model
Organization-wide CloudTrail with encrypted log archive
AWS Config recording continuous compliance state
Security Hub with CIS & AWS FSBP standards enabled
S3 public-access block enforced at account level
EBS encryption-by-default on every region
IAM password policy enforced via SCP
VPC default-security-group hardening
KMS key management with audit logging
Root user MFA enforced & root keys removed
Secrets Manager / Parameter Store for credentials
Audit log retention with lifecycle policies

IaC & Deployment

10 features · Terraform-native

Modular Terraform with reusable per-env modules
Remote state with S3 + DynamoDB locking, encrypted at rest
Workspace-per-environment isolation strategy
GitHub Actions CI/CD with OIDC (no long-lived AWS keys)
Terraform plan + apply gated by required PR review
Pre-commit hooks for tfsec / Checkov / formatting
Drift detection via scheduled plan runs
Container builds with SBOM & vulnerability scanning
Image promotion across environments (same artifact)
One-click rollback path documented per service

Networking & Kubernetes

8 features · production-tuned

Multi-AZ VPCs with private + public subnets
Transit Gateway for inter-account / region connectivity
VPC endpoints for AWS services (cost & latency wins)
EKS cluster with managed node groups + Karpenter
Network policies via Calico or Cilium
Cert-manager & external-DNS pre-configured
Ingress (ALB / NGINX) with WAF rules
Pod security standards enforced cluster-wide

Cost & Operations

9 features · FinOps-ready

Cost allocation tags enforced at account-creation
Per-team, per-env, per-customer cost breakdowns
AWS Budgets with email/Slack alerts
Cost anomaly detection on every account
Reserved Instance & Savings Plan recommendations
S3 lifecycle policies (Standard → IA → Glacier)
Centralized logging with retention & alerting
Backup & restore drills documented and tested
Runbook library for top 10 incident scenarios

The Architecture

What it looks like in your AWS Organization.

A simplified view of the multi-account structure we deploy. The exact OUs, regions, and accounts get tuned to your stage — but the core shape is consistent across every engagement.

cloudico-org · reference architecture

CIS 100% Multi-region 100% IaC

Management OU

root-management

SCPs · Org-level config

AWS OrganizationsSSO

log-archive

Immutable CloudTrail logs

S3Glacier

audit / security

Security Hub aggregator

GuardDutyConfig

Production OU

prod · us-east-1

Primary region

EKSRDSALB

prod · eu-west-1

DR / failover region

EKSRDS replica

shared-services

DNS · CI runners · registry

Route53ECR

Dev / Staging OU

staging

Pre-prod mirror

EKSTest data

development

Engineer sandboxes

Per-engineer IAM

sandbox

Throwaway experiments

Auto-clean

CI/CD

GitHub Actions · OIDC

Observability

Prometheus · Grafana

Secrets

AWS Secrets Manager

FinOps

Budgets · Anomaly

What You Walk Away With

Numbers your CFO and CTO both care about.

Averaged across our last 14 Cloud Infrastructure engagements. Your numbers will vary — but the shape is consistent.

0%

CIS Foundation Benchmark pass rate

From day one

0 wk

Median kickoff to production

Across 14 engagements

0%

Uptime achieved over 12 months

Post-handover average

−0%

Reduction in monthly cloud spend

vs prior ClickOps setup

How It Runs

Three steps. Clear deliverables at each one.

The full 6-phase process from the Services overview, condensed into the 3 visible milestones you’ll experience. No mystery, no scope creep.

Kickoff

1

Week 0 · ≤ 2 hours

Kickoff & requirements

A 90-minute working session with your team. We learn your stack, compliance posture, growth plans, and which AWS accounts already exist. We confirm scope and pricing in writing before any paid work begins.

You’ll have

Written requirements doc
Confirmed scope & fixed price
Mutual NDA in place
AWS Organization access mapped

Engineer at workstation deploying infrastructure

Build

2

Weeks 1–6 · build

Deploy & configure

We deploy the multi-account foundation, migrate existing workloads without downtime, and harden security alongside your team. Weekly written demos and a Slack channel for real-time questions. Your dev team keeps shipping the whole time.

You’ll have

Live multi-account org
Production workloads migrated
Terraform code in your repo
CI/CD pipelines running

Knowledge transfer session with engineering team

Handover

3

Week 7–8 · handover

Handover & knowledge transfer

Live walkthrough of every account, control plane, and runbook. We pair-program with your engineers until they can ship changes confidently on their own. Then 30 days of post-handover Slack access — no extra cost — while you settle in.

You’ll have

Complete runbook library
Recorded walkthroughs
30-day Slack support
Compliance evidence pack

After We Ship

Two paths forward. You pick.

Once the foundation is live, you choose how much of us you want around. Both paths give you 100% code ownership — no lock-in either way.

Engineering team self-managing infrastructure

Your team owns it

Self-managed handover

Take full ownership from day one. Your team runs everything. We’re done after the 30-day post-handover window unless you call us back.

Complete Terraform codebase in your GitHub
Full runbook library + recorded walkthroughs
30 days of free Slack-channel support
No ongoing fees, no vendor lock-in
Re-engage anytime on a project basis

Investment after delivery

$0/month · you own it

Start with handover

Embedded SRE engineer working alongside team

Senior engineer embedded

Recommended

Embedded SRE retainer

A senior engineer stays embedded. We keep the foundation evolving as you grow: new accounts, security drift, cost optimization, on-call backup. Month-to-month, cancel anytime.

Everything in self-managed handover
Senior engineer on your Slack & GitHub
New accounts / regions provisioned for you
Monthly written security & cost review
On-call backup for incidents
Cancel any month, no notice required

Investment after delivery

From $4.5k/month

Discuss a retainer

Featured Engagement

One client. From console-clicks to multi-account in 5 weeks.

Cloud Infrastructure AWS · Terraform · EKS 5 weeks delivered

From a single AWS account to a SOC 2-ready multi-account org

B2B SaaS · ~80 engineers · Series B

“We’d been on AWS for four years and never moved past one account. Cloudico migrated us to a proper multi-account org in five weeks. The first SOC 2 audit after that was the easiest one we’d ever done.”

Marcus Thompson

CTO · B2B SaaS, Series B

The team had grown from 12 to 80 engineers on a single AWS account with hand-built CloudFormation. IAM had drifted, costs were unallocated, and the upcoming SOC 2 audit was a ticking clock. We migrated workloads into a 9-account org structure with zero downtime, ported infrastructure to Terraform, and set up the evidence trail their auditor needed. Their team owns every line of code we wrote.

Stack & tools shipped

AWS OrganizationsTerraformEKSKarpenterGitHub ActionsOIDCSecurity HubGuardDutyCloudTrailDatadog

5 weeks

Kickoff to production multi-account org

100%

CIS Foundation Benchmark on first audit

−32%

AWS spend reduction with no perf impact

0

Customer-impacting incidents during migration

Before The Call

The seven questions CTOs ask us most.

Direct answers to the questions that come up before every Cloud Infrastructure discovery call. Different from the FAQ on the services overview — these are specific to this engagement.

Ask us directly

Will this disrupt our existing AWS workloads?

No. We attach the new multi-account structure to your existing AWS Organization (or create one if you don’t have it) and migrate accounts into the new structure without downtime. Workloads keep running. Engineers keep shipping. There’s no freeze window, no maintenance pause, no production cutover — guardrails get rolled out gradually around your live services.

How is this different from AWS Control Tower?

Control Tower is a great starting point but still requires significant console work to reach the security and automation level you actually need. Our setup ships 100% as Terraform (no ClickOps), enforces CIS-level compliance from day one, and includes a named senior engineer who builds it with your team. Same end-state, but you get there in 6 weeks instead of 6 months, and you own every line of code.

Do we need to pause feature development while you work?

No. Your engineers keep shipping their normal roadmap the entire time. The new infrastructure is built in parallel and your existing workloads migrate into it gradually. Most teams describe the experience as “surprisingly quiet” — there’s a Slack channel where we report progress weekly, but your dev team’s day-to-day doesn’t change.

What if we need changes after delivery?

The entire Terraform codebase lives in your GitHub repository. Your team can modify OU structures, Service Control Policies, IAM roles, and account configurations through normal pull requests. The architecture scales to dozens of accounts. If you want our help, the Embedded SRE retainer covers exactly this kind of ongoing change work. Otherwise, any AWS-fluent engineer can extend it.

What if something breaks after handover?

30 days of post-handover Slack support is included by default — questions, edge cases, integration help. After that, you’re on the retainer or you’re not. Either way, the Landing Zone uses standard AWS services and standard Terraform patterns. There’s nothing proprietary in our build that could lock you up. Any AWS engineer familiar with Terraform can troubleshoot and resolve issues.

Do you do GCP and Azure, or just AWS?

All three. AWS is our deepest expertise — about 70% of engagements. GCP comes second (especially for AI/ML workloads using Vertex AI or GKE). Azure third, mostly for teams already tied to Microsoft’s ecosystem. Whichever you’re on, the engineering patterns (multi-project / multi-subscription, IaC, GitOps, least-privilege identity) translate cleanly across clouds.

What does pricing actually look like?

Cloud Infrastructure engagements start at $18k fixed-scope, typical range $24–48k depending on number of accounts, regions, existing-workload complexity, and compliance scope (SOC 2 / HIPAA add to the scope). Pricing is confirmed in writing after the discovery call — never variable, never hourly. If we miss the timeline on our side, you don’t pay for the overrun. Optional embedded retainer from $4.5k/month after delivery, cancel anytime.

Ready when you are

Production-grade AWS in 4–8 weeks.

Book a 30-minute discovery call. Senior engineer on the call. We’ll map your stack, surface the right scope, and confirm pricing in writing before any paid work starts.

Book Discovery Call Compare all services

30-min consult Mutual NDA available Written scope & price No obligation

Production AWS in 4 weeks, not 4 quarters.

AWS rarely breaks all at once.It drifts.

The honeymoon phase

The silent sprawl

The complexity wall

This wall is preventable.

Five problems thatdon’t fix themselves.

One mistake takes down everything.

You can’t see where the money goes.

You’ll hit AWS service limits.

Compliance becomes a nightmare.

IAM permissions are a tangled web.

Build it yourself, use Control Tower, or get it right.

Six capability blocks.All shipped.

Everything you need to actually run production.

Multi-account architecture

Everything as Terraform

Kubernetes done right

CI/CD with safe rollback

Security baked in

Cost visibility & guardrails

No fine print. This is what ships.

Every feature lives in your repo, not ours.

What it looks like in your AWS Organization.

Numbers your CFO and CTO both care about.

Three steps. Clear deliverables at each one.

Kickoff & requirements

Deploy & configure

Handover & knowledge transfer

Two paths forward. You pick.

Self-managed handover

Embedded SRE retainer

One client. From console-clicks to multi-account in 5 weeks.

The seven questions CTOs ask us most.

Production-grade AWS in 4–8 weeks.

Production AWS in 4 weeks,
not 4 quarters.

AWS rarely breaks all at once.
It drifts.

Five problems that
don’t fix themselves.

Six capability blocks.
All shipped.