Filtering Process Output With tee

Posted at Feb 10, 2024

Recently I ran into a situation where it was necessary to capture the output of a Java process on the stdout stream, and at the same time a filtered subset of the output in a log file. The former, so that the output gets picked up by the Kubernetes logging infrastructure. The letter for further processing on our end: we were looking to detect when the JVM stops due to an OutOfMemoryError, passing on that information to some error classifier.

Simply redirecting the standard output stream of the process to a file wouldn’t satisfy the first requirement. Instead, the tee command offers a solution: it reads from stdin and writes everything to stdout as well as a file:

1
2
$ java -jar my-app.jar -XX:+ExitOnOutOfMemoryError \
  | tee my-app.log (1)

1	Pipe `stdout` to `tee`, which writes it to both `stdout` and a log file

This kinda gives us what we want, but we lack control over the size of that log file. As is, it can grow indefinitely, eventually causing the application’s pod to run out of disk space. For the case at hand, we’re just interested in specific lines anyways. So ideally the content written to the log file would be filtered accordingly, while exposing the complete output to the Kubernetes log collector via stdout.

To accommodate that requirement, process substitution can be used. In a nutshell, it provides a bridge between the standard input and output streams and files:

>(commands) substitutes a file a process writes to with another process which receives the written content on stdin
<(commands) substitutes a file a process reads from with another process which provides the content on stdout

Note that there must be no space between >/< and the left parenthesis. I.e. this is no redirection. The former variant is exactly what we need: instead of directly writing all the process output to the log file, we use grep to filter any written content, based on the string we’re looking for:

1
2
$ java -jar my-app.jar -XX:+ExitOnOutOfMemoryError \
  | tee >(grep 'OutOfMemoryError' > my-app.log) (1)

1	Represent the `stdin` of `grep` as a file for `tee` to write to

That way, the complete stdout of our process gets exposed to Kubernetes' logging infrastructure, while only the filtered output get written to our log file:

1
2
$ cat my-app.log
Terminating due to java.lang.OutOfMemoryError: Java heap space

To get a better intuition of what process substitution does under the hood, let’s create a simple Java program which reads from a file specified as a program argument:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;

public void main(String... args) throws Exception {
  var fileName = args[0];
  System.out.println("File: " + fileName); (1)

  String content = Files.readString( (2)
      Paths.get(fileName), Charset.defaultCharset()
  );
  System.out.println("Content: " + content);
}

1	Print the passed file name
2	Print the content of the file

Here’s the program’s output when using process substitution for exposing the stdout of echo:

1
2
3
$ java --enable-preview --source 21 read_file.java <(echo "hello")
File: /dev/fd/11
Content: hello

/dev/fd is a special directory which contains a file descriptor for each file opened by a process. So what is /dev/fd/11 then? Most implementations of process substitution represent stdin/stdout via anonymous pipes. If we take a look at the list of files opened by the process, we can see that this is the case here too:

1
2
3
4
$ lsof -p 99657
COMMAND   PID   USER   FD   TYPE             DEVICE  SIZE/OFF                NODE NAME
...
java    99657 gunnar   11   PIPE 0xc2e0b19eaf172929     16384

FD 11 is a pipe created through process substitution, and the standard Java file I/O APIs can be used to read its contents via that descriptor.

Gunnar Morling

Random Musings on All Things Software Engineering

Gunnar Morling

Random Musings on All Things Software Engineering

Filtering Process Output With tee