Process Substitution: My Bash Trick to Cut Disk I/O and Boost Performance

Q: How does Bash create the temporary writable file in `/proc`?

Bash spawns a subshell, connects its stdout to a pipe, and uses `/dev/fd/` to expose the pipe’s file descriptor as a pathname. The pathname is a pseudo-file that the shell can open like any other file.

Q: Is there a size limit to the data that can be stored?

The data never sits in a fixed-size buffer; it flows through the pipe. The only limit is available memory for the pipe buffer and the target command’s ability to consume the stream.

Q: Can I use it with commands that need to seek?

Some tools can’t seek on a pipe, so they may warn or fail. Test your command first; most text utilities will work, and those that need random access often provide a fallback.

Q: How does it handle errors from the inner command?

Errors from the inner command go to its stderr. If you redirect that stderr (e.g., `2> >(tee /dev/stderr)`), you’ll see the errors in the terminal and log them.

Q: What if I need to stream to more than two servers?

Fork multiple background `ssh` sessions, each reading from its own `<(tar cf - …)` pipe. The network will carry the data in parallel, and no extra disk I/O is required.

Q: Are there security implications of writing to `/proc`?

The `/dev/fd/` path is a file descriptor that is only valid for the current shell session. Other users cannot read from it, so there’s no leakage risk.

Q: How does job control interact with background process substitutions?

The shell may not consider the job finished until all process substitutions are drained. This can delay the prompt, but the commands still finish normally.

Table of Contents

TL;DR

Process substitution lets Bash create temporary files in /proc so you can feed data directly into commands that expect regular files, without touching disk.
Use diff <(sort file1) <(sort file2) to compare sorted files instantly, no temp file.
Preview sed changes with sed “s/foo/bar/” <(cat file) or sed “s/foo/bar/” <(cat file) | less.
Log errors while watching output: cmd 2> >(tee /dev/stderr).
Stream archives over SSH: ssh host "tar xf -" <(tar cf - dir) – no local tarball ever.

Why this matters

I spent years wrestling with mktemp, cp, and rm just to keep my scripts tidy. Every extra file I wrote to disk meant a new flash write, more wear on SSDs, and longer run-times. When I started using Bash’s process substitution, I realized I could eliminate a whole class of temporary files and reduce I/O by almost half.

The biggest pain points for shell programmers are:

Manually creating temporary files with mktemp and cleaning them up later.
Unnecessary disk writes that slow scripts down.
Difficulty comparing sorted files or previewing edits without first writing an intermediate file.
Risk of accidentally overwriting originals when redirecting output.
Simultaneously monitoring a program’s stdout and stderr.
Managing parallel uploads or extractions across multiple servers.
Editing a file in place when the only tool that works on it is sed or awk.

Process substitution addresses all of these. It lets you treat the output of a command as if it were a file you can read from or write to, all the while the data remains in memory or in a pipe. It’s a hidden gem that, once you start using it, you’ll wonder how you ever scripted without it.

Core concepts

At its core, process substitution is a Bash syntax that looks like this:

<(command)   # input substitution – gives a pathname to a temporary file
>(command)   # output substitution – gives a pathname to a temporary file

When Bash sees <(… ), it runs command in a subshell, connects its stdout to a pipe, and then replaces the whole <(… ) expression with a pathname that points to a pseudo-file under /dev/fd/. On Linux, that path is usually something like /dev/fd/63, which is a file descriptor that the shell can open like any regular file. The same logic applies to >(…), but the pipe goes the other way: Bash opens a file descriptor for the parent process to write to, while the subshell reads from it. Because the pseudo-file lives in the virtual /proc filesystem, the data never hits the physical disk. The only cost is the kernel’s pipe buffer, which is tiny. That’s why process substitution is faster than writing to a temporary file on SSD, especially for large streams. The trick is that you can use these paths wherever a command expects a filename. For example:

diff <(sort file1) <(sort file2)

Here sort file1 and sort file2 each run in their own subshells. Their outputs are routed to pipes that Bash exposes as temporary files. diff reads from those paths as if they were regular files, producing the same result as if you had written each sorted stream to disk first. Process substitution also works in reverse:

cat >(tee /tmp/log.txt) /etc/passwd

cat writes /etc/passwd to its stdout, which is simultaneously sent to tee. tee writes the stream to /tmp/log.txt and forwards it to the shell’s stdout, so you can see the file content while a copy is being archived.

Why it matters for random-access tools

Some tools like awk or sed don’t care whether the input comes from a pipe or a file; they just read byte by byte. However, tools that require seek operations, like less or ed, will treat a pipe as a regular file but may warn that the file is not seekable. In most cases the tool gracefully falls back to line-buffered reading, so process substitution remains safe.

How to apply it

Below are my go-to recipes that have saved me hours of debugging and I/O.

1. Compare sorted files without temp files

diff <(sort src.txt) <(sort dst.txt)

No mktemp, no rm. You instantly see differences after sorting, and the diff output is identical to what you’d get with temporary files.

2. Preview sed changes

Instead of writing a temp file and then running sed -i, run:

sed \"s/old/new/\" <(cat config.yml) > /tmp/preview.yml && less /tmp/preview.yml

cat feeds the original file into a process substitution that sed reads from. The result lands in /tmp/preview.yml, which you can examine with less. If satisfied, replace the original:

sed -i \"s/old/new/\" config.yml

3. Log errors while watching stdout

When a long command prints lots of data and you also want to keep a record of its errors:

make 2> >(tee /tmp/make_errors.log)

make writes its stderr to the process substitution; tee copies that stream to /tmp/make_errors.log and forwards it back to the terminal.

4. Stream archives over SSH to multiple servers

No need to ship a tarball locally first. Use:

tar cf - ./project | ssh host1 \"tar xf -\" | ssh host2 \"tar xf -\"

But with process substitution you can split the stream:

tar cf - ./project | ssh host1 \"tar xf -\" & \
 tar cf - ./project | ssh host2 \"tar xf -\"

Even better, use process substitution to feed the same tarball to two SSH sessions concurrently:

ssh host1 \"tar xf -\" <(tar cf - ./project) &
ssh host2 \"tar xf -\" <(tar cf - ./project)

Each ssh reads from a distinct pipe, so the network traffic is parallelized without extra disk I/O.

5. Tee a file while writing it

You sometimes want a command to write to a file and still see the output:

tee >(cat > output.log) <(cat src.txt)

The first >(…) creates a pipe that tee writes to; the second <(…) gives tee the data to start with. tee outputs to both stdout and the pipe.

6. Edit a temporary file in Vim

If you need to tweak a large text block but don’t want to touch the original, use:

vim <(cat original.txt)

Vim opens a read-only buffer that is a temporary file. After editing, save it to a new file:

vim <(cat original.txt) > edited.txt

You’re editing the contents of original.txt without risking accidental overwrite.

7. Parallel execution with minimal overhead

When you have two independent commands that can run at the same time, wrap them in a background process and feed each other via process substitution:

cat <(cmd1 &) <(cmd2 &)

Both cmd1 and cmd2 run concurrently, and their outputs are concatenated for the cat that follows.

Pitfalls & edge cases

Process substitution isn’t a silver bullet. Here are a few gotchas.

Shell support: It works in Bash, zsh, and ksh, but not in /bin/sh or Dash. On a busy-box system you’ll get a syntax error. If portability is a must, use mktemp instead.
Quoting: The syntax <(…) is parsed before variable expansion. That means you can’t put it inside single quotes. It must be in double quotes or unquoted.
Large streams: While the pipe buffer is small, if the inner command produces a massive amount of data, the parent process may block until the consumer catches up. That can stall the pipeline, but it’s the same behavior you’d get with a real temporary file.
Random-access tools: Commands that expect to seek on a file (e.g., ed or vi in normal mode) will either refuse or fallback to a warning. For most text editors the pipe works fine.
Security: The temporary path is visible under /dev/fd/, but it isn’t a real file you can read from other users; it’s a file descriptor that only your shell can access. So there’s no race condition with other users overwriting it.
Job control: When you run a subshell in a background job, the shell may still wait for the process substitution to finish before reporting the job as done. It’s a subtle timing issue, but rarely a blocker.

Quick FAQ

Q: How does Bash create the temporary writable file in /proc? A: Bash spawns a subshell, connects its stdout to a pipe, and uses /dev/fd/ to expose the pipe’s file descriptor as a pathname. The pathname is a pseudo-file that the shell can open like any other file. Q: Is there a size limit to the data that can be stored? A: The data never sits in a fixed-size buffer; it flows through the pipe. The only limit is available memory for the pipe buffer and the target command’s ability to consume the stream. Q: Can I use it with commands that need to seek? A: Some tools can’t seek on a pipe, so they may warn or fail. Test your command first; most text utilities will work, and those that need random access often provide a fallback. Q: How does it handle errors from the inner command? A: Errors from the inner command go to its stderr. If you redirect that stderr (e.g., 2> >(tee /dev/stderr)), you’ll see the errors in the terminal and log them. Q: Does it work the same in zsh or dash? A: It’s supported in Bash, zsh, and ksh. Dash and POSIX sh do not implement it. Use a test script to confirm on your system. Q: What if I need to stream to more than two servers? A: Fork multiple background ssh sessions, each reading from its own <(tar cf - …) pipe. The network will carry the data in parallel, and no extra disk I/O is required. Q: Are there security implications of writing to /proc? A: The /dev/fd/ path is a file descriptor that is only valid for the current shell session. Other users cannot read from it, so there’s no leakage risk. Q: How does job control interact with background process substitutions? A: The shell may not consider the job finished until all process substitutions are drained. This can delay the prompt, but the commands still finish normally.

Conclusion

Process substitution is a low-overhead, high-performance way to treat command output as a file, all while keeping your scripts clean and your disks happy. Give it a try in your next shell script: compare two files in a single line, preview sed edits, stream a tarball over SSH without touching disk, and even log errors in real time. The only real limitation is shell support—Bash, zsh, or ksh—and a few quirks around quoting and background jobs. Once you incorporate it into your toolbox, you’ll wonder how you ever wrote temporary files for everything.

Happy hacking!