[MM’s] Terraform Notes — Kafka as Code with Terraform and Confluent Cloud

Mastering Confluent Cloud resource management with HCL.

Confluent Cloud Kafka is often introduced to teams as a managed service — no servers, no worries. But once you have multiple teams, services and environments, the reality changes fast.

Manual setup through the Confluent Cloud UI leads to:

Slightly different topic configs between environments
Overprivileged service accounts created “just to make it work”
No reliable audit trail of who changed what and why

At that point, Kafka infrastructure becomes a liability rather than an enabler.

⚡ TL;DR (Quick Recap)

Kafka infrastructure should be versioned, reviewed and reproducible
Terraform provides a clean abstraction over Confluent Cloud APIs
Modular design enables safe reuse across teams and environments

Why Kafka as Code Matters Now

Modern Kafka usage goes far beyond a handful of topics. Today’s platforms often include:

Multiple Kafka clusters across regions
Fine-grained ACLs per service and consumer group
Strict separation between dev, staging and production

Without Infrastructure as Code, keeping this consistent is nearly impossible. Terraform turns Kafka into a declarative system: the desired state is written once, reviewed in Git and enforced automatically.

Designing a Modular Kafka Repository

Instead of a single, monolithic Terraform configuration, the repository is split into focused modules. Each module solves one problem well and does so explicitly, avoiding hidden defaults.

Cluster Module (Best-Practice Hardened)

Clusters are environment-scoped resources and should always be created alongside their Confluent environment. Provider versions are pinned to avoid breaking changes.

terraform {
  required_providers {
    confluent = {
      source  = "confluentinc/confluent"
      version = "~> 2.57"
    }
  }
}

resource "confluent_environment" "this" {
  display_name = var.environment_name
}
resource "confluent_kafka_cluster" "this" {
  display_name = var.cluster_config.name
  availability = var.cluster_config.availability
  cloud        = var.cluster_config.cloud_provider
  region       = var.cluster_config.region
  
  dynamic "basic" {
    for_each = var.cluster_config.type == "BASIC" ? [1] : []
    content {}
  }

  dynamic "standard" {
    for_each = var.cluster_config.type == "STANDARD" ? [1] : []
    content {}
  }

  dynamic "dedicated" {
    for_each = var.cluster_config.type == "DEDICATED" ? [1] : []
    content {
      cku = var.cluster_config.cku
    }
  }
  environment {
    id = confluent_environment.this.id
  }
}

Service Identity Module (Least Privilege by Default)

Service accounts represent applications — never humans. They are cheap to create and should be scoped tightly.

locals {
  display_name = "${var.identity.team}-${var.identity.name}"
  description  = var.identity.description != "" ? var.identity.description : "Service account for ${var.identity.team}/${var.identity.name}" 
}
resource "confluent_service_account" "this" {
  display_name = local.display_name
  description  = local.description
}

Topic Module (Explicit, Protected, Reviewable)

Kafka topics are part of your public contract. Defaults are dangerous and deletions should be guarded.

locals {
  topic_name = "${var.team}.${var.name}"
}
resource "confluent_kafka_topic" "this" {
  kafka_cluster {
    id = var.cluster.id
  }
  topic_name       = local.topic_name
  partitions_count = var.topic_config.partitions
  config = {
    "cleanup.policy"      = var.topic_config.cleanup_policy
    "retention.ms"        = var.topic_config.retention_ms
    "min.insync.replicas" = var.topic_config.min_insync_replicas
  }
  rest_endpoint = var.cluster.rest_endpoint
  credentials {
    key    = var.cluster.api_key
    secret = var.cluster.api_secret
  }
  lifecycle {
    prevent_destroy = true
  }
}

Topic ACLs (Never Implicit)

Topics without ACLs are configuration drift waiting to happen. Permissions must be explicit and reviewable.

resource "confluent_kafka_acl" "topic_read" {
  kafka_cluster {
    id = var.cluster.id
  }
  resource_type = "TOPIC"
  resource_name = confluent_kafka_topic.this.topic_name
  pattern_type  = "LITERAL"
  principal     = "User:${var.reader_service_account_id}"
  operation     = "READ"
  permission    = "ALLOW"
  rest_endpoint = var.cluster.rest_endpoint
  credentials {
    key    = var.cluster.api_key
    secret = var.cluster.api_secret
  }
}

Environment Isolation Without Duplication

The environments/ directory separates what you deploy from where you deploy it.

dev/ uses smaller, cost-effective clusters
prod/ enables high availability and stricter limits

The same modules are reused everywhere; only variables change. This guarantees parity while allowing intentional differences.

The Deployment Workflow

Once the modules are in place, Kafka provisioning becomes boring — in the best possible way.

terraform init
terraform plan -out=dev.tfplan
terraform apply dev.tfplan

No hidden steps. No UI clicks. The plan shows exactly what will change before anything happens.

In practice, these commands are wrapped in CI pipelines, ensuring that Kafka changes follow the same deployment standards as application code.

What This Solves in Real Teams

Adopting Kafka as Code immediately improves day-to-day operations:

Consistency: dev, staging and prod are structurally identical
Security: Permissions are explicit and reviewable
Speed: New services get Kafka resources in minutes
Confidence: Rollbacks are possible because changes are tracked

New engineers can understand the Kafka setup by reading Terraform.

What Confluent Docs Don’t Tell You

The Confluent Terraform Provider is powerful — but there are several realities you only discover after running Kafka in production.

ACLs are not optional. Creating topics without explicit ACLs works — until a new service appears and permissions are patched manually. If ACLs are not in Terraform, they will drift.

Defaults are dangerous. Broker defaults differ between cluster types and evolve over time. If retention or cleanup policies matter, declare them explicitly or expect surprises.

Terraform deletions are risky. A terraform destroy on a Kafka topic is irreversible. prevent_destroy is not paranoia — it’s operational hygiene.

Workspaces often don’t map to Kafka reality. Kafka environments are not interchangeable. Separate state files beat clever abstractions every time.

Lessons Learned

Never commit API keys — inject them via CI, Terraform Cloud or Vault
Always use remote state for Kafka infrastructure
Keep modules opinionated and small
Start simple, evolve only when real constraints appear

Final Takeaways

Kafka as Code is not about Terraform — it’s about control.

By modeling Kafka infrastructure declaratively, you replace manual effort with repeatable systems. Confluent Cloud handles the operational heavy lifting; Terraform ensures that your intent is clearly expressed and consistently applied.

Start with one environment. One cluster. One topic.

You can find all the code on GitHub.

Originally posted on marconak-matej.medium.com.

[MM’s] Terraform Notes — Kafka as Code with Terraform and Confluent Cloud