Gunnar Morling

Recent posts

Nov 16, 2024

Building OpenJDK From Source On macOS

Every now and then, it can come in very handy to build OpenJDK from source yourself, for instance if you want to explore a feature which is under development on a branch for which no builds are published. For some reason I always thought that building OpenJDK is a very complex processing, requiring the installation of arcane tool chains etc. But as it turns out, this actually not true: the project does a great job of documenting what’s needed and only a few steps are necessary to build your very own JDK.

Oct 18, 2024

CDC Is a Feature Not a Product

During and after my time as the lead of Debezium, a widely used open-source platform for Change Data Capture (CDC) for a variety of database, I got repeatedly asked whether I’d be interested in creating a company around CDC. VCs, including wellknown household names, did and do reach out to me, pitching this idea.

Oct 6, 2024

How I Am Setting Up VMs On Hetzner Cloud

Whenever I’ve need a Linux box for some testing or experimentation, or projects like the One Billion Row Challenge a few months back, my go-to solution is Hetzner Online, a data center operator here in Europe. Their prices for VMs are unbeatable, starting with 3,92 €/month for two shared vCPUs (either x64 or AArch64), four GB of RAM, and 20 TB of network traffic (these are prices for their German data centers, they vary between regions). four dedicated cores with 16 GB, e.g. for running a small web server, will cost you 28.55 €/month. Getting a box with similar specs on AWS would set you back a multiple of that, with the (outbound) network cost being the largest chunk. So it’s not a big surprise that more and more people realize the advantages of this offering, most notably Ruby on Rails creator David Heinemeier Hansson, who has been singing the praise for Hetzner’s dedicated servers, but also their VM instances, quite a bit on Twitter lately.

Aug 26, 2024

Leader Election With S3 Conditional Writes

Update Aug 30: This article is discussed on Hacker News and lobste.rs. In distributed systems, for instance when scaling out some workload to multiple compute nodes, it is a common requirement to select a leader for performing a given task: only one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc. Otherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.

Jul 6, 2024

Shell Spell: Extracting and Propagating Multiple Values With jq

In my day job at Decodable, I am currently working with Terraform to provision some cloud infrastructure for an upcoming hands-on lab. Part of this set-up is a Postgres database on Amazon RDS, which I am creating using the Terraform AWS modules. Now, once my database was up and running, I wanted to extract two dynamically generated values from Terraform: the random password created for the root user, and the database host URL. On my way down the rabbit hole for finding a CLI command for doing this efficiently, I learned a few interesting shell details which I’d like to share.

Mar 18, 2024

A Zipping Gatherer

The other day, I was looking for means of zipping two Java streams: connecting them element by element—essentially a join based on stream offset position—and emitting an output stream with the results. Unfortunately, there is no zip() method offered by the Java Streams API itself. While it was considered for inclusion in early preview versions, the method was removed before the API went GA with Java 8 and you have to resort to 3rd party libraries such as Google Guava if you need this functionality.

Feb 20, 2024

Last Updated Columns With Postgres

In many applications it’s a requirement to keep track of when a record was created and updated the last time. Often, this is implemented by having columns such as created_at and updated_at within each table. To make things as simple as possible for application developers, the database itself should take care of maintaining these values automatically when a record gets inserted or updated.

Feb 10, 2024

Filtering Process Output With tee

Recently I ran into a situation where it was necessary to capture the output of a Java process on the stdout stream, and at the same time a filtered subset of the output in a log file. The former, so that the output gets picked up by the Kubernetes logging infrastructure. The letter for further processing on our end: we were looking to detect when the JVM stops due to an OutOfMemoryError, passing on that information to some error classifier.

Feb 4, 2024

1BRC—The Results Are In!

Oh what a wild ride the last few weeks have been. The One Billion Row Challenge (1BRC for short), something I had expected to be interesting to a dozen folks or so at best, has gone kinda viral, with hundreds of people competing and engaging. In Java, as intended, but also beyond: folks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk. It’s really incredible how far people have pushed the limits here. Pull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation. Today I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31 and all submissions have been reviewed.

Jan 1, 2024

The One Billion Row Challenge

Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit. For folks to show-case non-Java solutions, there is a "Show & Tell" now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot. Thanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries. Let’s kick off 2024 true coder style—I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31. Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

Gunnar Morling

Random Musings on All Things Software Engineering

Gunnar Morling

Random Musings on All Things Software Engineering

Recent posts

Building OpenJDK From Source On macOS

CDC Is a Feature Not a Product

How I Am Setting Up VMs On Hetzner Cloud

Leader Election With S3 Conditional Writes

Shell Spell: Extracting and Propagating Multiple Values With jq

A Zipping Gatherer

Last Updated Columns With Postgres

Filtering Process Output With tee

1BRC—The Results Are In!

The One Billion Row Challenge