PostgreSQL Replication How-To

Preparing A Resilient Slave Replica

When someone is asked to configure a PostgreSQL slave replica, the first thing that get’s done usually is “Googling It”, then clicking on the first title that sounds like a good How-To article without paying any attention to the subtle details that could make a huge difference between building a sloppy slave replica and a resilient one.

This recipe focuses on building a resilient setup for a Slave replica of a Master of any size and band-width usage and applies safety margins into the configuration in case a slave lag or conflict gets introduced (arising from any unexpected conditions).

Let’s get started.

The instructions below will essentially work on any PostgreSQL version above 9.5+ (this includes 9.6, 10, 11 and 12).

Step 1. Modify Configuration on Master Node

Modify the file postgresql.conf on the Master Node:

# The WAL level should be hot_standby 
wal_level = hot_standby
# Allow up to 5 standbys and backup processes to connect at a time. # This should be set to at least 2 connections
max_wal_senders = 5
# wal_keep_segments, the minimum number of segments to keep in the pg_xlog directory. Each segment is 16 MB in size.
wal_keep_segments = 1000 # Retain 1000 segments minimum

The wal_keep_segments simply instructs the Master Server to keep the last 1000 WAL segments (i.e. 16 GigaBytes of disk drive in the pg_xlog directory since each segment is 16 MB in size).

Remember that the Master streams the WAL segment files to the Slave using streaming replication. However, with no safety guards in-place, the following could happen as described in Streaming Replication section of the official documentation:

From 25.2.5 Streaming Replication:
If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that’s accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.

And this is why we have set wal_keep_segments to a large value.

The other two options to safe-guard this condition are:

  1. Using replication slots, which I do not recommend since an admin can easily forget to delete them when removing a slave which will result in the disk drive filling up with WAL segments waiting for their slot to be consumed (Replication Slots are worth using in other cases like Barman backups)
  2. The second option would be to use file based replication as a secondary mechanism to act as a supporting mechanism when streaming replication falls behind. This is also a good approach, however, it won’t be covered in this article.

So, for now, using wal_keep_segments is a sound and safe option.

Isn’t 1000 a high value for wal_keep_segments ?
Some might argue that keeping 1000 segments consuming 16 GB of drive space is a large value. I would say 16 GB of space isn’t an issue considering the low storage prices on the cloud, and considering that this will protect your Master-Slave setup. If you find this value very high and you can’t afford using 16 GB of storage, then you can either lower this value or even consider utilizing Replication Slots (which is not covered in this article).

Step 2. Create the replica_user on the Master Node

Now, we need to create the replica_user on the master node, this user will be used by the slave node to connect for streaming replication.

# sudo -u postgres psql
CREATE ROLE replica_user REPLICATION LOGIN PASSWORD ‘new_password’;

What’s the different between CREATE ROLE and CREATE USER?
There is no difference at all. They are the same statement. CREATE USER gets translated by PostgreSQL internally into CREATE ROLE, as there is no such thing as a USER in PostgreSQL terminology. A User is simply a Role.

CREATE USER was only retained for backward compatibility.

Step 3. Modify pg_hba.conf on Master Server

Modify pg_hba.conf on the master server to allow this user to connect from the slave node:

# TYPE DATABASE USER ADDRESS METHOD
host replication replica_user 10.0.0.2/32 md5

Step 4. Modify listen_addresses on Master

Make sure to modify postgresql.conf so that listen_addresses listens on all interfaces.

listen_addresses = '0.0.0.0'

Optionally, you can also provide a list of interfaces, e.g. listen_addresses = 'localhost,10.0.0.1'

Step 5. Restart PostgreSQL on Master Node for Changes To Take Effect

systemctl restart postgresql

Step 6. Open the port 5432 on Master Node to allow connections from Slave Node

On Ubuntu, assuming the slave node’s ip address is 10.0.0.9:

ufw allow proto tcp from 10.0.0.9/32 to any port 5432

Step 7. (On Slave Node) Perform a Base Backup on the Slave Node

SSH into your slave node, and perform the following.

  1. Stop the postgresql instance using:
systemctl stop postgresql

2. Perform a base backup on the master node.
But first delete the contents of the current directory:

##################################################################
# DISCLAIMER:
# Make sure you are on slave node
##################################################################
DATADIR=/var/lib/postgresql/9.5/main
cp -R $DATADIR $DATADIR'_backup'
rm -rf $DATADIR/*

Then run the command to perform the base backup:

# Assuming 10.0.0.1 is the master node IP address
pg_basebackup -h 10.0.0.1 -U replica_user --xlog-method=stream -D $DATADIR
chown -R postgres:postgres $DATADIR

In case the master node is serving a live system and is under heavy load, you can (and would be advised to) limit the transfer rate between the master and the slave so as not to affect the responsiveness of the live master node. This can be done by adding -r rate, example, -r 512k and -r 5M representing 512 KB/second and 5 Megabytes/second respectively.

pg_basebackup -r 5M -h 10.0.0.1 -U replica_user — wal-method=stream -D $DATADIRchown -R postgres:postgres $DATADIR

pg_basebackup seems to hang or fails to run?
Just make sure that port 5432 is open on the master node for the slave node IP address to connect from.

Step 8. (On Slave Node) Create the recovery.conf file

Create the recovery.conf file under /var/lib/postgresql/9.5/main/recovery.conf with the following content:

cat <<EOT > $DATADIR/recovery.conf
# This tells the slave to keep pulling WALs from master
standby_mode = on
# Master connection
primary_conninfo = 'host=10.0.0.1 user=replica_user password=replica_user_password'
EOT
# Change the permission
chown postgres:postgres $DATADIR/recovery.conf

Step 9. (On Slave Node) Enable Slave To Serve as a Read Replica

Modify postgresql.conf by enabling hot_standby:

hot_standby = onmax_connections = 100 # Should match on master and slave
# or else will result in an error

The above setting will allow to execute read queries on the slave.

One thing to (make sure that max_connections setting has the same value on the master and the slave).

Finally, start postgresql again.

systemctl start postgresql

Step 10. (On Slave Node) Check the Log File for Any Errors

tail /var/log/postgresql/postgresql-9.5-main.log

Step 11. Protect Your Replica Against Conflicts

Long running SELECT queries (as well as other conditions) might create conflicts. Such conflicts can be protected from by increasing the value of max_standby_streaming_delay (and max_standby_archive_delay if you are using file based archiving). The default value is 30 seconds, after which PostgreSQL cancels any running SELECT queries.

This value can be increased to 900 (=15 minutes) to protect any long running SELECT query from being cancelled (e.g. while taking a pg_dump). To do that, modify postgresql.conf in the slave node by setting the values as follows:

max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s

Another approach would be to pause streaming replication before running the SELECT query then resuming it after the SELECT query has completed:

select pg_xlog_replay_pause(); # pause
select * from some_table;
select pg_xlog_replay_resume(); # resume

Although this works well, I would recommend against it since it would be easy to forget calling pg_xlog_replay_resume() which would lead to the slave drive filling up with WAL files waiting for replication to resume.

That’s all that is needed for the replica.

The next parts I’ll be discussing how to monitor your replica as well as handle failover to switch the Slave node to a Master node.

A Software Engineer with interests in data storage and database systems done right.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store