Kafka & Partitioning¶
This page covers the Kafka partitioning constraints Turbine imposes for stateful workloads, and the role of partition_key for sub-partition parallelism.
Topic Partitioning¶
For any stateful subscription (uses state.put, windows, or partition_key), the input Kafka topic's partition count must be a divisor of 960. The 28 valid counts are:
| Range | Valid partition counts |
|---|---|
| Small (dev, single-broker) | 1, 2, 3, 4, 5, 6, 8, 10, 12 |
| Medium (3-6 broker clusters) | 15, 16, 20, 24, 30, 32 |
| Large (high-resource deployments) | 40, 48, 60, 64, 80, 96, 120 |
| Very large (1 partition per Turbine worker thread) | 160, 192, 240, 320, 480, 960 |
Common Kafka choices that are not valid: 7, 9, 11, 13, 17, 100, 128, 256, 512. The boot banner warns when a subscribed topic uses an invalid count and suggests the closest valid alternatives; turbine-cli rescale rejects invalid --new-partitions values with the same suggestion.
Why this constraint exists. Turbine shards state into a fixed number of internal groups (960) so that the topic can be repartitioned without moving state around. Each Kafka partition owns a contiguous range of those groups, and rows produced with Kafka's standard partitioner land on the worker that already holds their state — but only when the partition count divides the group count evenly. Picking a non-divisor count works at steady state but cannot be rescaled cleanly later.
Stateless subscriptions (no state parameter, no windows, no partition_key) are unaffected — any partition count works.
partition_key & Sub-Partition Parallelism¶
TODO
Dedicated explainer needed:
- What
partition_keyis and why you'd use it (sharding inside a single Kafka partition for higher per-partition throughput on free-threaded Python). - The
parallelism=Nknob and how records are dispatched across shards. - The windowing constraint (every window must be keyed by the same column as the subscription) — already enforced at construction time.
- Trade-offs vs. simply increasing the Kafka topic's partition count (sub-partition parallelism is intra-process; multi-partition is cluster-wide).
- Migration path between the two when scaling up.