May 14, 2025
"Streaming vs. Batch" Is a Wrong Dichotomy, and I Think It's Confusing
Often times, "Stream vs. Batch" is discussed as if it’s one or the other, but to me this does not make that much sense really.
Read More...
Apr 24, 2025
What If We Could Rebuild Kafka From Scratch?
The last few days I spent some time digging into the recently announced KIP-1150 ("Diskless Kafka"), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.
This got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—Kafka.next if you will—which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:
Read More...
Apr 16, 2025
A Deep Dive Into Ingesting Debezium Events From Kafka With Flink SQL
Over the years, I’ve spoken quite a bit about the use cases for processing Debezium data change events with Apache Flink, such as metadata enrichment, building denormalized data views, and creating data contracts for your CDC streams. One detail I haven’t covered in depth so far is how to actually ingest Debezium change events from a Kafka topic into Flink, in particular via Flink SQL. Several connectors and data formats exist for this, which can make things somewhat confusing at first. So let’s dive into the different options and the considerations around them!
Read More...
Apr 7, 2025
Building a Native Binary for Apache Kafka on macOS
When I wrote about ahead-of-time class loading and linking in Java 24 recently, I also published the start-up time for Apache Kafka as a native binary for comparison. This was done via Docker, as there’s no pre-built native binary of Kafka available for the operating system I’m running on, macOS. But there is a native Kafka container image, so this is what I chose for the sake of convenience.
Now, running in a container adds a little bit of overhead of course, so it wasn’t a surprise when Thomas Würthinger, lead of the GraalVM project at Oracle, brought up the question what the value would be when running Kafka natively on macOS. Needless to say I can’t leave this kind of nice nerd snipe pass, so I set out to learn how to build a native Kafka binary on macOS, using GraalVM.
Read More...
Mar 27, 2025
Let's Take a Look at... JEP 483: Ahead-of-Time Class Loading & Linking!
Java 24 got released last week, and what a meaty release it is: more than twenty Java Enhancement Proposals (JEPs) have been shipped, including highlights such as compact object headers (JEP 450, I hope to spend some time diving into that one some time soon), a new class-file API (JEP 484), and more flexible constructor bodies (JEP 492, third preview). One other JEP which might fly a bit under the radar is JEP 483 ("Ahead-of-Time Class Loading & Linking"). It promises to reduce the start-up time of Java applications without requiring any modifications to the application itself, what’s not to be liked about that? Let’s take a closer look!
Read More...
Mar 18, 2025
The Synchrony Budget
For building a system of distributed services, one concept I think is very valuable to keep in mind is what I call the synchrony budget: as much as possible, a service should minimize the number of synchronous requests which it makes to other services.
Read More...
Mar 5, 2025
Let's Take a Look at... KIP-932: Queues for Kafka!
That guy above? Yep, that’s me, whenever someone says "Kafka queue". Because, that’s not what Apache Kafka is. At its core, Kafka is a distributed durable event log. Producers write events to a topic, organized in partitions which are distributed amongst the brokers of a Kafka cluster. Consumers, organized in groups, divide the partitions they process amongst themselves, so that each partition of a topic is read by exactly one consumer in the group.
Read More...
Jan 28, 2025
Get Running with Apache Flink on Kubernetes, part 2 of 2
This post originally appeared on the Decodable blog. All rights reserved.
Welcome back to this two-part blog post series about running Apache Flink on Kubernetes, using the Flink Kubernetes operator. In part one, we discussed installation and setup of the operator, different deployment types, how to deploy Flink jobs using custom Kubernetes resources, and how to create container images for your own Flink jobs. In this part, we’ll focus on aspects such as fault tolerance and high availability of your Flink jobs running on Kubernetes, savepoint management, observability, and more. You can find the complete source code for all the examples shown in this series in the Decodable examples repository on GitHub: on GitHub.
Read More...
Jan 21, 2025
Get Running with Apache Flink on Kubernetes, part 1 of 2
This post originally appeared on the Decodable blog. All rights reserved.
Kubernetes is a widely used deployment platform for Apache Flink. While Flink has had native support for Kubernetes for quite a while, it is in particular the operator pattern which makes deploying Flink jobs onto Kubernetes clusters a compelling option: you define jobs in a declarative resource, and a control loop running in a component called a Kubernetes operator takes care of provisioning and maintaining (e.g. scaling, updating) all the required resources. Automation is the keyword here, significantly reducing the manual effort required for running Flink jobs in production.
Read More...
Dec 3, 2024
Failover Replication Slots with Postgres 17
This post originally appeared on the Decodable blog. All rights reserved.
Postgres read replicas are commonly used not only to distribute query load amongst multiple nodes, but also to ensure high availability (HA) of the database. If the primary node of a Postgres cluster fails, a read replica can be promoted to be the new primary, processing write (and read) requests from thereon.
Read More...