Why PostgreSQL major version upgrades are hard

Upgrades between PostgreSQL major versions are famously annoying. You can’t just install the server binaries and restart, because the format of the data directory is incompatible.

Why is that? Why can’t we just keep the data format compatible?

Perhaps surprisingly, the data format is actually mostly compatible, but not completely. There are just a few things missing that are very hard to solve.

Let’s look at the possible upgrading procedures:

pg_upgrade
dump and restore using pg_dumpall
logical replication to a new instance

2 and 3 are essentially the same idea: Build a new instance and transport the data between the instances in a higher-level, compatible format.

pg_upgrade is more interesting. What pg_upgrade does is:

Take a dump (using pg_dump) of the schema of the old database, and restore that to the new database. (Actually separately for each database in the instance, and in combination with pg_dumpall, but that’s not important here.)
Copy the data files directly from the old instance to the new instance.

So the data files, which contain the table and index data, are actually compatible between major versions. They have been for a long time, ever since pg_upgrade was introduced.

How that is managed depends on what is in those files. For example, btree has a straightforward versioning mechanism:

#define BTREE_VERSION   4       /* current version number */
#define BTREE_MIN_VERSION   2   /* minimum supported version */

heap is more complicated, it just has to maintain compatibility with whatever old versions might have stored. But in any case, this works.

What’s not compatible is the way the schema (the data definitions, the metadata for tables etc.) is stored. This is what pg_upgrade has to deal with.

So then why is the schema storage incompatible between major versions?

The schema data of a PostgreSQL database is stored in so-called system catalogs, which are themselves just tables. For bootstrapping reasons, and also some efficiency reasons, the structure of these system catalogs is hardcoded in the server. For example, when the system wants to know the name of a column, it loads the pg_attribute row, and it knows at what offset the attname field with the column name begins. This has to be hardcoded, because, well, you can’t query pg_attribute to find out information about pg_attribute.

Whenever a new feature is implemented that needs to store some information in the system catalogs, this hardcoded knowledge becomes incorrect. For example, in PostgreSQL 17, subscriptions got a new failover flag. This failover flag needs to be stored somewhere. It is stored in the subfailover field of the pg_subscription catalog. Because of that, the hardcoded size of the pg_subscription rows changes, and the offset of some fields after the new subfailover field changes. (For additional complicated reasons, you can’t just add all new fields to the end.)

There would appear to be an obvious solution for this: The server source code just has to have conditional code for old catalog layouts. But this code is very widespread, a rough estimate shows more than a thousand instances. So making this happen would be a massive effort and require significant ongoing maintenance. So I think before this could happen, some new ideas would be required for how to first refactor some of that.

The other problem is that this still wouldn’t provide a way to upgrade the system catalogs to the new layout. If you upgraded to PostgreSQL 17 and wanted to create a failover subscription, you couldn’t, because the catalogs would still be in the old format. And then you would still need to upgrade those somehow, and you’d need tooling to manage all of that.

Now, in a different world, the system catalogs would have been designed differently, with compatibility and upgradability in mind. This idea that system catalogs are normal tables is perhaps an original Berkeley Postgres idea. And it’s really nice, because it gives you some useful features, especially transactional DDL with relatively little additional effort. But it does make the format harder to upgrade.

So I tend to think that that’s the trade-off of these historical architectural decisions: easy upgrades or easy transactional DDL.

To be clear, this is not the only barrier to effortless upgrades. But I think it’s the most significant one. Another one is that the serialization format of the internal data structures that record for example stored views or stored default expressions in the system catalogs is not managed for compatibility across major versions. Because there has been no need to so far. But I think I solution could be found for that. There are a number of things like that, things we haven’t even thought about very hard, because there hasn’t been a need. For example, the format of the write-ahead log (WAL) is incompatible between major versions. Because pg_upgrade and none of the other upgrade procedures preserve the WAL, this is not a problem. I’m just mentioning it here to indicate that there are other, less explored issues if the ones we know about are addressed.