2001: PostgreSQL 7.1: write-ahead log
PostgreSQL 7.1 introduced the write-ahead log (WAL). Before that release, all open data files had to be fsynced on every commit, which is very slow. Slow fsyncing is still a problem today, but now we’re only worried about fsyncing the WAL, and fsyncing the data files during the checkpoint process. Back then, we had to fsync everything all the time.
In the original design of university POSTGRES, the lack of a log was intentional, and contrasted with heavily log-based architectures such as Oracle. In Oracle, you need the log to roll back changes. In PostgreSQL, the nonoverwriting storage system takes care of that. But probably nobody thought about implications for fsyncing back then.
Note that the WAL was really just an implementation detail at this point. You couldn’t read or archive it.
Just for context: Slony-I 1.0 was released in July 2004.
2005: PostgreSQL 8.0: point-in-time recovery
PostgreSQL 8.0 added the possibility to copy the WAL somewhere else,
and later play it back, either all the way or to a particular point in
time, hence the name point-in-time recovery (PITR) for this feature.
This feature was mainly intended to relieve
pg_dump as a backup
method. Until then, the only backup method was a full dump, which
would get impractical as databases grew. Hence this method to take an
occasional base backup, which is the expensive part, and then add on
parts of the WAL, which is cheaper.
The basic configuration mechanisms that we still use today, for
recovery.conf file, were introduced as part of this
But still no replication here.
2008: PostgreSQL 8.3: pg_standby
Crafty people eventually figured that if you archived WAL on one
server and at the same time “recovered” endlessly on another, you’d
have a replication setup. You could probably have set this up with
your own scripts as early as 8.0, but PostgreSQL 8.3 added the
pg_standby program into
contrib, which gave everyone a standard
tool. So, arguably, 8.3 is the first release that contained a
semblance of a built-in replication solution.
The standby server was in permanent recovery until promotion, so it couldn’t be read from as it was replicating. This is what we’d now call a warm standby.
I think a lot of PostgreSQL 8.3 installations refuse to die, because this is the first version where you could easily have a reasonably up-to-date reserve server without resorting to complicated and sometimes problematic tools like Slony or DRBD.
2010: PostgreSQL 9.0: hot standby, streaming replication
In PostgreSQL 9.0, two important replication features arrived completely independently. First, the possibility to connect to a standby server in read-only mode, making it a so-called hot standby. Whereas before, a standby server was really mainly useful only as a reserve in case the primary server failed, with hot standby you could use secondary servers to spread out read-only loads. Second, instead of relying solely on the WAL archive and recovery functionalities to transport WAL data, a standby server could connect directly to the primary server via the existing libpq protocol and obtain WAL data that way, so-called streaming replication. The primary use in this release was that the standby could be more up to date, possibly within seconds, rather than several minutes with the archive-based approach. For a robust setup, you would still need to set up an archive. But streaming replication was also a forward-looking feature that would eventually make replication setups easier, by reducing the reliance on the old archiving mechanisms.
PostgreSQL 9.0 was the first release where one could claim that PostgreSQL “supports replication” without having to make qualifications or excuses. Although it is scheduled to go EOL later this year, I expect this release will continue to live for a long time.
2011: PostgreSQL 9.1: pg_basebackup, synchronous replication
pg_basebackup was one of the features facilitated by streaming
replication that made things easier. Instead of having to use
external tools like
rsync for base backups,
use a normal libpq connection to pull down a base backup, thus
avoiding complicated connection and authentication setups for external
tools. (Some people continue to favor
rsync because it is faster
PostgreSQL 9.1 also added synchronous replication, which ensures that
data is replicated to the designated synchronous standby before a
COMMIT reports success. This feature is frequently misunderstood by
users. While it ensures that your data is on at least two servers at
all times, it might actually reduce the availability of your system,
because if the standby server goes down, the primary will also go
down, unless you have a third server available to take over the
synchronous standby duty.
Less widely know perhaps is that PostgreSQL 9.1 also added the
pg_last_xact_replay_timestamp function for easy monitoring of
In my experience, the availability of
pg_last_xact_replay_timestamp make PostgreSQL 9.1 the first release
were managing replication was reasonably easy. Go back further, and
you might feel constrained by the available tools. But in 9.1, it’s
not that much different from what is available in the most recent
2012: PostgreSQL 9.2: cascading replication
Not as widely acclaimed, more for the Slony buffs perhaps,
PostgreSQL 9.2 allowed standbys to fetch their streaming replication
data from other standbys. A particular consequence of that is that
pg_basebackup could copy from a standby server, thus taking the load
off the primary server for setting up a new standby or standalone
2013: PostgreSQL 9.3: standby can follow timeline switch
This did not even make it into the release note highlights. In PostgreSQL 9.3, when a primary has two standbys, and one of the standbys is promoted, the other standby can just keep following the new primary. In previous releases, the second standby would have to be rebuilt. This improvement makes dynamic infrastructure changes much simpler. Not only does it eliminate the time, annoyance, and performance impact of setting up a new standby, more importantly it avoids the situation that after a promotion, you don’t have any up to update standbys at all for a while.
2014: PostgreSQL 9.4: replication slots, logical decoding
Logical decoding got all the press for PostgreSQL 9.4, but I think
replication slots are the major feature, possibly the biggest
replication feature since PostgreSQL 9.0. Note that while streaming
replication has gotten more sophisticated over the years, you still
needed a WAL archive for complete robustness. That is because the
primary server didn’t actually keep a list of its supposed standby
servers, it just streamed whatever WAL happened to be requested if it
happened to have it. If the standby server fell behind sufficiently
far, streaming replication would fail, and recovery from the archive
would kick in. If you didn’t have an archive, the standby would then
no longer be able to catch up and would have to be rebuilt. And this
archiving mechanism has essentially been unchanged since version 8.0,
when it was designed for an entirely different purpose. So a
replication setup is actually quite messy: You have to configure an
access path from the primary to the standby (for archiving) and an
access path from the standby to the primary (for streaming). And if
you wanted to do multiple standbys or cascading, maintaining the
archive could get really complicated. Moreover, I think a lot of
archiving setups have problematic
archive_command settings. For
example, does your
archive_command fsync the file on the receiving
side? Probably not.
No more: In PostgreSQL 9.4, you can set up so-called replication slots, which effectively means that you register a standby with the primary, and the primary keeps around the WAL for each standby until the standby has fetched it. With this, you can completely get rid of the archiving, unless you need it as a backup.
2015? PostgreSQL 9.5? pg_rewind?
One of the remaining problems is that promoting a standby leaves the
old primary unable to change course and follow the new primary. If
you fail over because the old primary died, then that’s not an issue.
But if you just want to swap primary and standby, perhaps because the
standby has more powerful hardware, then the old primary, now standby,
needs to be rebuilt completely from scratch. Transforming an old
primary into a new standby without a completely new base backup is a
rather intricate problem, but a tool that can do it (currently named
pg_rewind) is proposed for inclusion into the next PostgreSQL
One of the problems that this evolution of replication has created is
that the configuration is rather idiosyncratic, quite complicated to
get right, and almost impossible to generalize sufficiently for
documentation, tutorials, and so on. Dropping archiving with 9.4
might address some of these points, but configuring even just
streaming replication is still weird, even weirder if you don’t know
how it got here. You need to change several obscure configuration
parameters, some on the primary, some on the standby, some of which
require a hard restart of the primary server. And then you need to
create a new configuration file
recovery.conf, even though you don’t
want to recover anything. Making changes in this area is mostly a
complex political process, because the existing system has served
people well over many years, and coming up with a new system that is
obviously better and addresses all existing use cases is cumbersome.
Another issue is that all of this functionality has been bolted on to the write-ahead log mechanism, and that constrains all the uses of the write-ahead log in various ways. For example, there are optimizations that skip WAL logging in certain circumstances, but if you want replication, you can’t use them. Who doesn’t want replication? Also, the write-ahead log covers an entire database system and is all or nothing. You can’t replicate only certain tables, for example, or consolidate logs from two different sources.
How about not bolting all of this on to the WAL? Have two different logs for two different purposes. This was discussed, especially around the time streaming replication was built. But then you’d need two logs that are almost the same. And the WAL is by design a bottleneck, so creating another log would probably create performance problems.
Logical decoding breaks many of these restrictions and will likely be the foundation for the next round of major replication features. Examples include partial replication and multimaster replication, some of which are being worked on right now.
What can we expect from plain WAL logging in the meantime? Easier configuration is certainly a common request. But can we expect major leaps on functionality? Who knows. At one point, something like hot standby was thought to be nearly impossible. So there might be surprises still.