Gunnar Morling

Gunnar Morling

Random Musings on All Things Software Engineering

Gunnar Morling

Gunnar Morling

Random Musings on All Things Software Engineering

Recent posts

Oct 6, 2024

How I Am Setting Up VMs On Hetzner Cloud

Whenever I’ve need a Linux box for some testing or experimentation, or projects like the One Billion Row Challenge a few months back, my go-to solution is Hetzner Online, a data center operator here in Europe. Their prices for VMs are unbeatable, starting with 3,92 €/month for two shared vCPUs (either x64 or AArch64), four GB of RAM, and 20 TB of network traffic (these are prices for their German data centers, they vary between regions). four dedicated cores with 16 GB, e.g. for running a small web server, will cost you 28.55 €/month. Getting a box with similar specs on AWS would set you back a multiple of that, with the (outbound) network cost being the largest chunk. So it’s not a big surprise that more and more people realize the advantages of this offering, most notably Ruby on Rails creator David Heinemeier Hansson, who has been singing the praise for Hetzner’s dedicated servers, but also their VM instances, quite a bit on Twitter lately.

Read More...

Aug 26, 2024

Leader Election With S3 Conditional Writes

Update Aug 30: This article is discussed on Hacker News and lobste.rs. In distributed systems, for instance when scaling out some workload to multiple compute nodes, it is a common requirement to select a leader for performing a given task: only one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc. Otherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.

Read More...

Jul 6, 2024

Shell Spell: Extracting and Propagating Multiple Values With jq

In my day job at Decodable, I am currently working with Terraform to provision some cloud infrastructure for an upcoming hands-on lab. Part of this set-up is a Postgres database on Amazon RDS, which I am creating using the Terraform AWS modules. Now, once my database was up and running, I wanted to extract two dynamically generated values from Terraform: the random password created for the root user, and the database host URL. On my way down the rabbit hole for finding a CLI command for doing this efficiently, I learned a few interesting shell details which I’d like to share.

Read More...

Mar 18, 2024

A Zipping Gatherer

The other day, I was looking for means of zipping two Java streams: connecting them element by element—​essentially a join based on stream offset position—​and emitting an output stream with the results. Unfortunately, there is no zip() method offered by the Java Streams API itself. While it was considered for inclusion in early preview versions, the method was removed before the API went GA with Java 8 and you have to resort to 3rd party libraries such as Google Guava if you need this functionality.

Read More...

Feb 20, 2024

Last Updated Columns With Postgres

In many applications it’s a requirement to keep track of when a record was created and updated the last time. Often, this is implemented by having columns such as created_at and updated_at within each table. To make things as simple as possible for application developers, the database itself should take care of maintaining these values automatically when a record gets inserted or updated.

Read More...

Feb 10, 2024

Filtering Process Output With tee

Recently I ran into a situation where it was necessary to capture the output of a Java process on the stdout stream, and at the same time a filtered subset of the output in a log file. The former, so that the output gets picked up by the Kubernetes logging infrastructure. The letter for further processing on our end: we were looking to detect when the JVM stops due to an OutOfMemoryError, passing on that information to some error classifier.

Read More...

Feb 4, 2024

1BRC—The Results Are In!

Oh what a wild ride the last few weeks have been. The One Billion Row Challenge (1BRC for short), something I had expected to be interesting to a dozen folks or so at best, has gone kinda viral, with hundreds of people competing and engaging. In Java, as intended, but also beyond: folks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk. It’s really incredible how far people have pushed the limits here. Pull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation. Today I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31 and all submissions have been reviewed.

Read More...

Jan 1, 2024

The One Billion Row Challenge

Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit. For folks to show-case non-Java solutions, there is a "Show & Tell" now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot. Thanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries. Let’s kick off 2024 true coder style—​I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31. Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

Read More...

Dec 17, 2023

Tracking Java Native Memory With JDK Flight Recorder

Update Dec 18: This post is discussed on Hacker News 🍊 As regular readers of this blog will now, JDK Flight Recorder (JFR) is one of my favorite tools of the Java platform. This low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues. JFR continues to become better and better with every new release, with one recent addition being support for native memory tracking (NMT).

Read More...

Nov 14, 2023

Can Debezium Lose Events?

This question came up on the Data Engineering sub-reddit the other day: Can Debezium lose any events? I.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?

Read More...