Skip to content

Kafka & Partitioning

This page covers the Kafka partitioning constraints Turbine imposes for stateful workloads, and the role of partition_key for sub-partition parallelism.

Topic Partitioning

For any stateful subscription (uses state.put, windows, or partition_key), the input Kafka topic's partition count must be a divisor of 960. The 28 valid counts are:

Range Valid partition counts
Small (dev, single-broker) 1, 2, 3, 4, 5, 6, 8, 10, 12
Medium (3-6 broker clusters) 15, 16, 20, 24, 30, 32
Large (high-resource deployments) 40, 48, 60, 64, 80, 96, 120
Very large (1 partition per Turbine worker thread) 160, 192, 240, 320, 480, 960

Common Kafka choices that are not valid: 7, 9, 11, 13, 17, 100, 128, 256, 512. The boot banner warns when a subscribed topic uses an invalid count and suggests the closest valid alternatives; turbine-cli rescale rejects invalid --new-partitions values with the same suggestion.

Why this constraint exists. Turbine shards state into a fixed number of internal groups (960) so that the topic can be repartitioned without moving state around. Each Kafka partition owns a contiguous range of those groups, and rows produced with Kafka's standard partitioner land on the worker that already holds their state — but only when the partition count divides the group count evenly. Picking a non-divisor count works at steady state but cannot be rescaled cleanly later.

Stateless subscriptions (no state parameter, no windows, no partition_key) are unaffected — any partition count works.

partition_key & Sub-Partition Parallelism

TODO

Dedicated explainer needed:

  • What partition_key is and why you'd use it (sharding inside a single Kafka partition for higher per-partition throughput on free-threaded Python).
  • The parallelism=N knob and how records are dispatched across shards.
  • The windowing constraint (every window must be keyed by the same column as the subscription) — already enforced at construction time.
  • Trade-offs vs. simply increasing the Kafka topic's partition count (sub-partition parallelism is intra-process; multi-partition is cluster-wide).
  • Migration path between the two when scaling up.