GitOps: Infrastructure as Code Done Right

Kicked TeamJanuary 20, 20265 min read

"We store our Terraform in Git" is not GitOps. It's version control. GitOps is an operational model where:

  1. The entire system is described declaratively
  2. The desired state lives in Git
  3. Changes are made through pull requests
  4. A controller continuously reconciles actual state with desired state

That last point is what separates GitOps from "configs in a repo."

Why GitOps?

We manage infrastructure for dozens of clients. Before GitOps, deployments looked like this:

Developer → SSH into server → Run commands → Hope it works → Forget what they changed

With GitOps:

Developer → Open PR → Review → Merge → Automated reconciliation → Drift detection

The benefits compound:

  • Audit trail — Every change is a Git commit with author, timestamp, and review
  • Rollbackgit revert is your "undo" button
  • Reproducibility — Spin up an identical environment from the same repo
  • Drift detection — The controller alerts when reality diverges from Git
  • Self-service — Developers can make infrastructure changes via PRs without SSH access

The Two Patterns: Push vs. Pull

Push-based (Traditional CI/CD)

Git Push → CI Pipeline → kubectl apply / terraform apply → Cluster

The pipeline has credentials to modify infrastructure. This works but has drawbacks:

  • CI system needs broad access to production
  • No continuous reconciliation — drift goes undetected
  • Pipeline failures can leave state partially applied

Pull-based (GitOps)

Git Push → Controller (in-cluster) → Detects diff → Reconciles → Cluster matches Git

The controller runs inside the cluster and pulls desired state from Git. This is the GitOps model.

Our GitOps Stack

ArgoCD for Kubernetes

ArgoCD is our primary GitOps controller for Kubernetes workloads. It watches Git repos and ensures the cluster matches.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-api
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/kicked-ro/infrastructure
    targetRevision: main
    path: apps/production/api
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true        # Delete resources removed from Git
      selfHeal: true     # Revert manual changes
    syncOptions:
      - CreateNamespace=true

Key configuration choices:

  • selfHeal: true — If someone manually edits a resource, ArgoCD reverts it. Git is the source of truth, always.
  • prune: true — Resources deleted from Git are deleted from the cluster. No zombie resources.
  • Automated sync — Merging to main triggers deployment. No manual button clicks.

Terraform + Atlantis for Cloud Resources

For cloud infrastructure (VMs, DNS, networking), we use Terraform with Atlantis for PR-based workflow:

Developer opens PR with Terraform change
  → Atlantis runs `terraform plan`
  → Plan output posted as PR comment
  → Reviewer approves
  → Atlantis runs `terraform apply`
  → State updated, PR merged

No one runs terraform apply locally. Ever.

Renovate for Dependency Updates

Keeping base images, Helm chart versions, and Terraform providers up to date is a full-time job. Renovate automates it:

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": ["config:recommended"],
  "kubernetes": {
    "fileMatch": ["apps/.+\\.yaml$"]
  },
  "regexManagers": [
    {
      "fileMatch": ["apps/.+\\.yaml$"],
      "matchStrings": ["image: (?<depName>.*?):(?<currentValue>.*?)\\s"],
      "datasourceTemplate": "docker"
    }
  ]
}

Renovate opens PRs for every update. ArgoCD deploys them when merged. Fully automated supply chain.

Repository Structure

We've converged on this structure after much iteration:

infrastructure/
├── apps/
│   ├── production/
│   │   ├── api/
│   │   │   ├── deployment.yaml
│   │   │   ├── service.yaml
│   │   │   └── kustomization.yaml
│   │   ├── web/
│   │   └── workers/
│   └── staging/
│       └── ... (mirrors production)
├── platform/
│   ├── cert-manager/
│   ├── ingress-nginx/
│   ├── monitoring/
│   └── external-secrets/
├── terraform/
│   ├── networking/
│   ├── dns/
│   └── compute/
└── renovate.json

Apps are the workloads. Platform is the shared infrastructure. Terraform is the cloud-layer resources. Each directory is an ArgoCD Application.

Secrets in GitOps

The one thing you can't put in Git: secrets. Our approach:

  1. Sealed Secrets — Encrypt secrets with a cluster-specific key. The encrypted version lives in Git. Only the cluster can decrypt.
  2. External Secrets Operator — Reference secrets in Vault/AWS SSM. Git stores the reference, not the value.

We prefer External Secrets Operator for production because secret rotation is automatic.

Lessons Learned

  1. Start with one app — Don't try to GitOps your entire infrastructure in week one
  2. Enforce selfHeal — If people can bypass Git, they will, and your source of truth becomes a lie
  3. PR reviews are mandatory — Even for the senior engineer. Especially for the senior engineer
  4. Test in staging first — ArgoCD ApplicationSets make multi-environment promotion trivial
  5. Monitor sync status — An app stuck in "OutOfSync" is a ticking time bomb

The Result

Since adopting GitOps across our infrastructure:

  • Zero unauthorized changes to production
  • 4-minute average deployment time (merge to running)
  • 100% audit trail for every infrastructure change
  • One-click disaster recovery (point ArgoCD at the repo, done)

GitOps isn't just a deployment strategy — it's an operational philosophy. If you're still SSH-ing into servers to make changes, let's talk.