What If We Could Rebuild Kafka From Scratch?
The last few days I spent some time digging into the recently announced KIP-1150 ("Diskless Kafka"), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.
This got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—Kafka.next if you will—which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:
-
Do away with partitions: topic partitions were crucial for scaling purposes when data was stored on node-local disks, but they are not required when storing data on effectively infinitely large object storage in the cloud. While partitions also provide ordering guarantees, this never struck me as overly useful from a client perspective. You either want to have global ordering of all messages on a given topic, or (more commonly) ordering of all messages with the same key. In contrast, defined ordering of otherwise unrelated messages whose key happens to yield the same partition after hashing isn’t that valuable, so there’s not much point in exposing partitions as a concept to users.
-
Key-centric access: instead of partition-based access, efficient access and replay of all the messages with one and the same key would be desirable. Rather than coarse-grained scanning of all the records on a given topic or partition, let’s have millions of entity-level streams! Not only would this provide access exactly to the subset of data you need, it would also let you increase and decrease the number of consumers dynamically based on demand, not hitting the limits of a pre-defined partition count. Key-level streams (with guaranteed ordering) would be a perfect foundation for Event Sourcing architectures as well as actor-based and agentic systems. In addition, this approach largely solves the problem of head-of-line blocking found in partition based systems with cumulative acknowledgements: if a consumer can’t process a particular message, this will only block other messages with the same key (which oftentimes is exactly what you’d want), while all other messages are not affected. Rather than coarse-grained partitions, individual messages keys are becoming the failure domain.
-
Topic hierarchies: available in systems like Solace, topic hierarchies promote parts of the message payload into structured path-like topic identifiers, allowing for clients to subscribe to arbitrary sub sets of all the available streams based on patterns in an efficient way, without requiring brokers to deserialize and parse entire messages.
-
Means of concurrency control: As is, using Kafka as a system of record can be problematic as you can’t prevent writing messages which are based on an outdated view of the stored data. Concurrency control, for instance via optimistic locking of message keys, would help to detect and fence off concurrent conflicting writes. That way, when a message gets acknowledged successfully, it is guaranteed that it has been produced seeing the latest state of that key, avoiding lost updates.
-
Broker-side schema support: Kafka treats messages as opaque byte arrays with arbitrary content, requiring out-of-bands propagation of message schemas to consumers. This can be especially problematic when erroneous (or malicious) producers send non-conformant data. Also, without additional tooling, the current architecture prevents Kafka data from being written to open table formats such as Apache Iceberg. For all these reasons, Kafka is used with a schema registry most of the time, but making schema support a first-class concept would allow for better user ergonomics—for instance, Kafka could expose AsyncAPI-compatible metadata out of the box—and also open the door for storing data in different ways, for instance in a columnar representation.
-
Extensibility and pluggability: a common trait of many successful open-source projects like Postgres or Kubernetes is their extensibility. Users and integrators can customize the behavior of the system by providing implementations of well-defined extension points and plug-in contracts, rather than by modifying the system’s core itself (following the Open-closed principle). This would enable for instance custom broker-side message filters and transformations (addressing many scenarios currently requiring a protocol-aware proxy such as Kroxylicious), storage formats (e.g. columnar), and more. Functionality such as rate limiting, topic encryption, or backing a topic via an Iceberg table should be possible to implement solely via extensions to the system.
-
Synchronous commit callbacks: End-to-end Kafka pipelines ensure eventual consistency. When producing a record to a topic and then using that record for materializing some derived data view on some downstream data store, there’s no way for the producer to know when it will be able to "see" that downstream update. For certain use cases it would be helpful to be able to guarantee that derived data views have been updated when a produce request gets acknowledged, allowing Kafka to act as a log for a true database with strong read-your-own-writes semantics.
-
Snapshotting: Currently, Kafka supports topic compaction, which will only retain the last record for a given key. This works well, if records contain the full state of the entity they represent (a customer, purchase order etc.). It doesn’t work though for partial or delta events, which describe changes to an entity and which need to be applied all after one another to fully restore the state of the entity. Assuming there was support for efficient key-based message replay (see above), this would take longer and longer, as the number of records for a key increases. Built-in snapshot support could allow for "logical compaction", passing all events for a key to some event handler which condenses them into a snapshot. This would then serve as the foundation for subsequent update events, while all previous records for that key could be removed during compaction.
-
Multi-tenancy: Any modern data system should be built with multi-tenancy in mind from the ground up. Spinning up a new customer-specific environment should be a very cheap operation, happening instantaneously; the workloads of individual tenants should be strictly isolated, not interfering with each other in regards to access control and security, resource utilization, metering etc.
Some of these features are supported in other systems already—for instance, high cardinality streams in S2, optimistic locking in Waltz, or multi-tenancy in Apache Pulsar. But others are not, and I am not aware of a single system, let alone open-source, which would combine all these traits.
Now, this describes my personal (which is to say, that in no way this post should be understood as speaking for my employer, Confluent, in any official capacity) wishlist for what a Kafka.next could be and the semantics it could provide, driven by the use cases and applications I’ve seen people wanting to employ Kafka for. But I am sure everyone who has worked with Kafka or comparable platforms for some time will have their own thoughts around this, and I’d love to learn about yours in the comments!
Finally, an important question of course is how would such a system actually be architected? While I’ll have to leave the answer to that for another time, it’s safe to say that building that system on top of a log-structured merge (LSM) tree would be a likely choice.