Delivery Guarantees¶

Turbine offers two delivery modes per subscription:

At-least-once — the default. Simple, zero overhead, may produce duplicates on a crash.
Exactly-once — opt-in. Outputs are never duplicated, even across crashes. Wraps each batch in a Kafka transaction.

This page covers the contract of each mode and how to choose. After a hard crash, local state can briefly lag the output under exactly-once — see Crash recovery. For bad messages (poison pills), handler exceptions, and compute errors, see Error handling. For pure reference (every parameter and its default), see Configuration.

At a glance¶

	At-least-once	Exactly-once
Output duplicates on crash	Possible (the in-flight batch may re-emit)	Never
Overhead per batch	None	~10–20 % on commit time, often within noise on end-to-end throughput
Requires `Turbine(app_id=...)`	No	Yes
Requires `output`	No (sinks allowed)	Yes (no output ⇒ nothing to commit transactionally)
Default	✅	Opt-in

At-least-once (default)¶

You get this without doing anything special:

from turbine import KafkaBroker, RecordBatch, Turbine

app = Turbine(brokers="localhost:9092")
kafka = KafkaBroker(bootstrap="localhost:9092")

@app.subscribe(kafka.topic("events"), output=kafka.topic("enriched"))
def enrich(batch: RecordBatch) -> RecordBatch:
    return enriched(batch)

app.run()

How it works: each batch is processed, the output messages are produced fire-and-forget, the state changes are persisted, and only then is the input offset committed. If the worker crashes mid-batch, the batch is replayed from the last committed offset on the next start — which means the same output messages may be emitted twice.

Use this when your downstream is idempotent. Upserts on a primary key, dedup keys carried on the message, append-only stores where a few duplicate rows are tolerable — all of these absorb at-least-once gracefully.

Exactly-once¶

Enable per subscription:

from turbine import KafkaBroker, RecordBatch, Turbine

app = Turbine(
    brokers="kafka:9092",
    app_id="orders-enrichment-prod",   # ← required as soon as any subscribe is EOS
)
kafka = KafkaBroker(bootstrap="kafka:9092")

@app.subscribe(
    kafka.topic("input-events"),
    output=kafka.topic("enriched-events"),
    processing_guarantee="exactly_once",
)
def enrich(batch: RecordBatch) -> RecordBatch:
    return enriched(batch)

app.run()

When paired with a downstream consumer that uses isolation.level=read_committed:

Every output message is visible exactly once downstream.
Input offsets advance atomically with the output writes — no acknowledgement of a message whose output didn't commit.
A crash between batches doesn't produce duplicates: the next instance fences the previous transactional id at startup.

The guarantee is outputs-never-twice, not "state is always perfectly fresh". On a hard crash, your local state (windowed accumulators, key-keyed values) may briefly lag the output topic — see Crash recovery for the full shape and the knobs that tune it.

Why `app_id` is mandatory¶

Kafka EOS relies on each producer using a stable identifier — the transactional id — that the broker tracks across restarts. When a new instance starts and re-registers under the same id, the broker fences the previous one: aborts any transaction it had open and refuses further writes from it. That fencing is what makes "no duplicates across restarts" actually hold.

Turbine derives one transactional id per (topic, partition) from app_id. If Turbine picked the value itself (e.g., a random UUID at process start), every restart would land on a fresh id and an old, still-running instance — split-brain network partition, kill -9 before cleanup, a previous pod Kubernetes hasn't terminated yet — would not be fenced. Both instances could commit to the same output topic in parallel, which is the exact duplicate-write scenario EOS exists to prevent.

app_id must be:

Stable across restarts of the same logical deployment — that's what fences the previous instance.
Unique per logical pipeline — two apps sharing an app_id will fence each other and neither will make progress.

"orders-enrichment-prod" or "alerting-staging" are good values — they name the deployment, not the host or pod.

Mixing modes in one app¶

A single Turbine app can host both at-least-once and exactly-once subscribes. Setting processing_guarantee="exactly_once" on one subscription doesn't affect the others — they keep the cheaper non-transactional path. The only constraint is one-way: as soon as any subscribe is EOS, Turbine(app_id=...) becomes mandatory.

app = Turbine(brokers="kafka:9092", app_id="my-pipeline-prod")
kafka = KafkaBroker(bootstrap="kafka:9092")

@app.subscribe(kafka.topic("billing"), output=kafka.topic("ledger"), processing_guarantee="exactly_once")
def post_ledger(batch): ...

@app.subscribe(kafka.topic("metrics"), output=kafka.topic("enriched-metrics"))  # at-least-once, no overhead
def annotate(batch): ...

Idempotency contract¶

The exactly-once guarantee covers the Kafka output topic only. If your handler also does external side effects — HTTP calls, writes to a database other than Kafka, log shipping, metric pushes — those can fire more than once when the framework reprocesses or recovers, including during a replay.

Make those effects idempotent (a dedup key, an upsert, an If-None-Match header, etc.), or accept that they may run multiple times. This is the same contract every streaming framework offers: the framework can only make its own write paths transactional.

If your handler is pure (Arrow transforms + state ops + return a batch for Turbine to produce), you have nothing to do — every output path Turbine controls is already covered.

Cost¶

EOS adds one Kafka transaction round-trip per batch plus a durable state flush, so the per-batch commit phase gets noticeably more expensive — on the order of +10–20 % on commit time alone. On end-to-end throughput the impact is much smaller because the commit phase is only a fraction of the batch loop on realistic workloads; expect single-digit percent overhead at typical batch sizes (a few thousand records or more), and often within run-to-run noise.

Rule of thumb: at moderate-to-large batches, EOS is essentially free. Very small batches (a few hundred records) or near-zero handler latency may surface a higher overhead ratio because the fixed per-batch cost amortises over fewer messages — increase batch_size / batch_timeout_ms if you observe this.

The bench numbers behind these statements live in docs/internal/exactly_once.md; they are workload-specific and not a contract.

What gets committed atomically¶

Each batch is wrapped in a Kafka transaction that covers both the output messages and the input offsets. The transaction commits as one atomic unit, then the local state is durably persisted. On any error inside the batch, the transaction is aborted: no output is visible to read_committed, the input offset is not advanced, and the batch is retried from the last committed offset on the next run.

Limitations¶

Single Kafka source. EOS across multiple brokers (e.g. a join over two clusters) is out of scope. Same for non-Kafka sinks — the transaction is Kafka-only.
Sink subscribes can't use EOS. A subscribe with no output has no output to make transactional. The framework rejects this at decoration time.
State durability is RocksDB + object-store snapshots, not Kafka changelog. This is what makes the state gap exist (the changelog approach folds state into the same TX and removes the gap entirely). A Kafka-changelog state backend is on the roadmap as an optional alternative for workloads that need zero gap by construction; until it lands, on_crash_recovery="replay" is the user-facing way to get the same outcome at the cost of replay time.

Choosing between the two¶

Workload	Recommended mode
Idempotent downstream (upserts, dedup keys, append-with-PK)	At-least-once — simpler, free.
Analytics aggregation read by `read_committed` consumers	Exactly-once. Duplicates would skew counts.
Stateful enrichment writing to a non-idempotent sink	Exactly-once.
Mission-critical state (financial, security) where staleness is unacceptable	Exactly-once with `on_crash_recovery="replay"`.
Sink (no `output`)	At-least-once (only option).

The internal design document — producer construction details, recovery flow, the changelog-backend plan — lives at docs/internal/exactly_once.md.