Gunnar Morling

Jul 22, 2026

A Fast Path for Fixed-Length Lists in Parquet

In its current form Apache Parquet isn’t a great fit for storing fixed-length lists, such as coordinates, RGB(A) colors, or—an increasingly common case—vector embeddings driving search and retrieval workloads. A 768-dimensional embedding is just a list of floats that always has the same length, yet Parquet’s Dremel machinery encodes it as if that length could vary from row to row, spelling out and reconstructing each vector’s structure on read. That costs roughly 3× more than a purely flat columnar representation of the same data (see apache/arrow#34510).

Jun 25, 2026

Hardwood 1.0: A Fast, Lightweight Apache Parquet Reader for the JVM

Hardwood is a new Parquet library for the JVM, written from scratch to do one thing well: read (and soon, write) Apache Parquet files fast, with no mandatory dependencies. It is performance-focused and multi-threaded at its core, fanning page decoding out across all your CPU cores by default. Today, Hardwood reaches 1.0. After five preview releases since the start of the year (Alpha1, Beta1, Beta2, CR1, CR2), we now consider Hardwood ready for production, and its public API will evolve with a strong focus on backwards compatibility going forward. Hardwood targets Java 21 or newer, is open-source (Apache License 2.0), and is available from Maven Central.

May 31, 2026

Improved Column Reader API, First Cut of Geospatial Support: Hardwood 1.0.0.CR1 Is Available

I am happy to announce the release of Hardwood 1.0.0.CR1! This first candidate release of Hardwood 1.0 brings a substantially improved API for columnar access to Apache Parquet files, initial support for Parquet’s GEOMETRY/GEOGRAPHY column types, and many other improvements to the core library as well as the Hardwood CLI.

Apr 29, 2026

VARIANT Support, Interactive Parquet File TUI: Hardwood 1.0.0.Beta2 Is Out

I am happy to announce the release of Hardwood 1.0.0.Beta2! The latest version of this new parser for Apache Parquet comes with support for VARIANT columns, an interactive text-based UI (TUI) for examining and analysing the structure of Parquet files, significantly improved performance, more efficient reading of files from object storage, and much more.

Apr 2, 2026

Hardwood Reaches Beta: S3, Predicate Push-Down, CLI, and More

I am pleased to announce the release of Hardwood 1.0.0.Beta1! Hardwood is a new parser for Apache Parquet, optimized for minimal dependencies and great performance. Since the project’s initial release just a few weeks back, a small yet very active community has come together and evolved Hardwood significantly. Today, we are shipping an S3 backend, allowing to parse files directly from object storage, predicate pushdown for both local and remote files, Avro bindings, a CLI for inspecting Parquet files, and much more. We’re also excited to launch a website for the project, hardwood.dev, which contains the documentation and API reference. Let’s dig in.

Feb 26, 2026

Hardwood: A New Parser for Apache Parquet

Today, it’s my great pleasure to announce the first public release of Hardwood, a new parser for the Apache Parquet file format, optimized for minimal dependencies and great performance. Hardwood is open-source (Apache License 2.0) and supports Java 21 or newer. You can grab it from Maven Central and start parsing your Parquet files with ease and efficiency.

Dec 7, 2025

You Gotta Push If You Wanna Pull

Historically, data management systems have been built around the notion of pull queries: users query data which, for instance, is stored in tables in an RDBMS, Parquet files in a data lake, or a full-text index in Elasticsearch. When a user issues a query, the engine will produce the result set at that point in time by churning through the data set and finding all matching records (oftentimes sped up by utilizing indexes).

Nov 25, 2025

On Idempotency Keys

In distributed systems, there’s a common understanding that it is not possible to guarantee exactly-once delivery of messages. What is possible though is exactly-once processing. By adding a unique idempotency key to each message, you can enable consumers to recognize and ignore duplicate messages, i.e. messages which they have received and successfully processed before.

Nov 20, 2025

Building a Durable Execution Engine With SQLite

Lately, there has been a lot of excitement around Durable Execution (DE) engines. The basic idea of DE is to take (potentially long-running) multi-step workflows, such as processing a purchase order or a user sign-up, and make their individual steps persistent. If a flow gets interrupted while running, for instance due to a machine failure, the DE engine can resume it from the last successfully executed step and drive it to completion.

Nov 3, 2025

"You Don't Need Kafka, Just Use Postgres" Considered Harmful

Looking to make it to the front page of HackerNews? Then writing a post arguing that "Postgres is enough", or why "you don’t need Kafka at your scale" is a pretty failsafe way of achieving exactly that. No matter how often it has been discussed before, this topic is always doing well. And sure, what’s not to love about that? I mean, it has it all: Postgres, everybody’s most favorite RDBMS—check! Keeping things lean and easy—sure, count me in! A somewhat spicy take—bring it on!

Gunnar Morling

Random Musings on All Things Software Engineering

Recent posts

A Fast Path for Fixed-Length Lists in Parquet

Hardwood 1.0: A Fast, Lightweight Apache Parquet Reader for the JVM

Improved Column Reader API, First Cut of Geospatial Support: Hardwood 1.0.0.CR1 Is Available

VARIANT Support, Interactive Parquet File TUI: Hardwood 1.0.0.Beta2 Is Out

Hardwood Reaches Beta: S3, Predicate Push-Down, CLI, and More

Hardwood: A New Parser for Apache Parquet

You Gotta Push If You Wanna Pull

On Idempotency Keys

Building a Durable Execution Engine With SQLite

"You Don't Need Kafka, Just Use Postgres" Considered Harmful