[MM’s] Terraform Notes — Kafka as Code with Terraform and Confluent Cloud
Mastering Confluent Cloud resource management with HCL.

Confluent Cloud Kafka is often introduced to teams as a managed service — no servers, no worries. But once you have multiple teams, services and environments, the reality changes fast.
Manual setup through the Confluent Cloud UI leads to:
- Slightly different topic configs between environments
- Overprivileged service accounts created “just to make it work”
- No reliable audit trail of who changed what and why
At that point, Kafka infrastructure becomes a liability rather than an enabler.
⚡ TL;DR (Quick Recap)
- Kafka infrastructure should be versioned, reviewed and reproducible
- Terraform provides a clean abstraction over Confluent Cloud APIs
- Modular design enables safe reuse across teams and environments
Why Kafka as Code Matters Now
Modern Kafka usage goes far beyond a handful of topics. Today’s platforms often include:
- Multiple Kafka clusters across regions
- Fine-grained ACLs per service and consumer group
- Strict separation between dev, staging and production
Without Infrastructure as Code, keeping this consistent is nearly impossible. Terraform turns Kafka into a declarative system: the desired state is written once, reviewed in Git and enforced automatically.
Designing a Modular Kafka Repository
Instead of a single, monolithic Terraform configuration, the repository is split into focused modules. Each module solves one problem well and does so explicitly, avoiding hidden defaults.
Cluster Module (Best-Practice Hardened)
Clusters are environment-scoped resources and should always be created alongside their Confluent environment. Provider versions are pinned to avoid breaking changes.
terraform {
required_providers {
confluent = {
source = "confluentinc/confluent"
version = "~> 2.57"
}
}
}
resource "confluent_environment" "this" {
display_name = var.environment_name
}
resource "confluent_kafka_cluster" "this" {
display_name = var.cluster_config.name
availability = var.cluster_config.availability
cloud = var.cluster_config.cloud_provider
region = var.cluster_config.region
dynamic "basic" {
for_each = var.cluster_config.type == "BASIC" ? [1] : []
content {}
}
dynamic "standard" {
for_each = var.cluster_config.type == "STANDARD" ? [1] : []
content {}
}
dynamic "dedicated" {
for_each = var.cluster_config.type == "DEDICATED" ? [1] : []
content {
cku = var.cluster_config.cku
}
}
environment {
id = confluent_environment.this.id
}
}Service Identity Module (Least Privilege by Default)
Service accounts represent applications — never humans. They are cheap to create and should be scoped tightly.
locals {
display_name = "${var.identity.team}-${var.identity.name}"
description = var.identity.description != "" ? var.identity.description : "Service account for ${var.identity.team}/${var.identity.name}"
}
resource "confluent_service_account" "this" {
display_name = local.display_name
description = local.description
}Topic Module (Explicit, Protected, Reviewable)
Kafka topics are part of your public contract. Defaults are dangerous and deletions should be guarded.
locals {
topic_name = "${var.team}.${var.name}"
}
resource "confluent_kafka_topic" "this" {
kafka_cluster {
id = var.cluster.id
}
topic_name = local.topic_name
partitions_count = var.topic_config.partitions
config = {
"cleanup.policy" = var.topic_config.cleanup_policy
"retention.ms" = var.topic_config.retention_ms
"min.insync.replicas" = var.topic_config.min_insync_replicas
}
rest_endpoint = var.cluster.rest_endpoint
credentials {
key = var.cluster.api_key
secret = var.cluster.api_secret
}
lifecycle {
prevent_destroy = true
}
}Topic ACLs (Never Implicit)
Topics without ACLs are configuration drift waiting to happen. Permissions must be explicit and reviewable.
resource "confluent_kafka_acl" "topic_read" {
kafka_cluster {
id = var.cluster.id
}
resource_type = "TOPIC"
resource_name = confluent_kafka_topic.this.topic_name
pattern_type = "LITERAL"
principal = "User:${var.reader_service_account_id}"
operation = "READ"
permission = "ALLOW"
rest_endpoint = var.cluster.rest_endpoint
credentials {
key = var.cluster.api_key
secret = var.cluster.api_secret
}
}Environment Isolation Without Duplication
The environments/ directory separates what you deploy from where you deploy it.
- dev/ uses smaller, cost-effective clusters
- prod/ enables high availability and stricter limits
The same modules are reused everywhere; only variables change. This guarantees parity while allowing intentional differences.
The Deployment Workflow
Once the modules are in place, Kafka provisioning becomes boring — in the best possible way.
terraform init
terraform plan -out=dev.tfplan
terraform apply dev.tfplan
No hidden steps. No UI clicks. The plan shows exactly what will change before anything happens.
In practice, these commands are wrapped in CI pipelines, ensuring that Kafka changes follow the same deployment standards as application code.
What This Solves in Real Teams
Adopting Kafka as Code immediately improves day-to-day operations:
- Consistency: dev, staging and prod are structurally identical
- Security: Permissions are explicit and reviewable
- Speed: New services get Kafka resources in minutes
- Confidence: Rollbacks are possible because changes are tracked
New engineers can understand the Kafka setup by reading Terraform.
What Confluent Docs Don’t Tell You
The Confluent Terraform Provider is powerful — but there are several realities you only discover after running Kafka in production.
ACLs are not optional. Creating topics without explicit ACLs works — until a new service appears and permissions are patched manually. If ACLs are not in Terraform, they will drift.
Defaults are dangerous. Broker defaults differ between cluster types and evolve over time. If retention or cleanup policies matter, declare them explicitly or expect surprises.
Terraform deletions are risky. A terraform destroy on a Kafka topic is irreversible. prevent_destroy is not paranoia — it’s operational hygiene.
Workspaces often don’t map to Kafka reality. Kafka environments are not interchangeable. Separate state files beat clever abstractions every time.
Lessons Learned
- Never commit API keys — inject them via CI, Terraform Cloud or Vault
- Always use remote state for Kafka infrastructure
- Keep modules opinionated and small
- Start simple, evolve only when real constraints appear
Final Takeaways
Kafka as Code is not about Terraform — it’s about control.
By modeling Kafka infrastructure declaratively, you replace manual effort with repeatable systems. Confluent Cloud handles the operational heavy lifting; Terraform ensures that your intent is clearly expressed and consistently applied.
Start with one environment. One cluster. One topic.
You can find all the code on GitHub.
Originally posted on marconak-matej.medium.com.