devscoach.

Point-in-time recovery for Postgresql

2024-04-02

devops

postgres

kamal

Have you ever wanted to travel back in time? With Postgres you can! Well at least you can see what your data looked like in the past. That's the beauty of point-in-time recovery (PITR) for Postgres. If an intern drops the production database, you don't need to worry, you can just recover the data to the point before they ran the drop command.

PITR utilizes normal Postgres backups as well as the WAL (Write Ahead Log) archiving to reach any state in the past for your database. Let's walk through the key points of setting it up. The official documentation is linked at the end of the post.

If you prefer video, check out my explanation on YouTube.

Starting from the base

The concept is relatively simple: start from a base level of data using a "base backup", and replay each WAL segment until you reach the desired time cutoff. This is the same procedure a Postgres instance uses if it crashes and restarts.

So how do we get a "base backup". Well the aptly named pg_basebackup command achieves it for us. It copies the entire contents of the Postgres data directory (minus the configuration files).

It's a one line command to tar, gzip, and send it to a new location for safe keeping.

# Create and compress the backup using pg_basebackup
# -t Tar -z gzip -P progress -v verbose -D data destination
pg_basebackup -h "$POSTGRES_HOST" -U "$POSTGRES_USER" -D \
              "$BACKUP_LOC" -F t -z -P -v

Technically, you could take a base backup once and then just keep the WAL archives, but having to replay weeks or months of WAL segments will slow down your recovery. It's better to continuously take base backups as well as WAL archives. This way there is double insurance. The first being the base backup which you can use to start a new Postgres instance. The second being the WAL archive.

To complete this section, create a cron job which runs the above command, and uploads it to your object storage of choice (S3, R2, etc) -- daily or weekly.

If you use Kamal, I recommend setting the base backup up as an accessory. Check out this dead simple setup.

  pg-16.1-backup:
    image: sjohnson540/postgres:latest
    host: <your host>
    env:
      clear:
        POSTGRES_USER: postgres
        AWS_DEFAULT_REGION: us-west-2
        AWS_S3_BUCKET: "my-secure-backups"
        POSTGRES_HOST: 10.0.0.3
      secret:
        - AWS_ACCESS_KEY_ID
        - AWS_SECRET_ACCESS_KEY
    cmd: bash -c "while true; do /etc/backup_command.sh; sleep 86400; done"

This uses a custom postgres image I created to house all of the bash scripts mentioned here. Then the command is changed to run the backup command, sleep for a day, and then run again -- in an infinite loop. The backup_command.sh is copied from the script above, just with the variables set correctly.

Archiving & Restoring the WAL

The WAL is a continuous stream of operations affecting the database -- i.e. CRUD operations. Since Postgres holds recent transactions in memory, if an instance crashes then those would be lost. But before an operation is available to the user, it is first written to the WAL (which is much faster than writing to data files on disk due to being a sequential event log). The WAL is then replayed after a crash to make sure all data is applied again.

As you might have guessed, we need to store each segment of the WAL so that we can perform the same function. Postgres provides the configuration parameter archive_command for this purpose.

Archive

archive_command will run any command you give it. If the command returns a success code (0), then Postgres assumes the segment was successfully saved and allows it to be deleted. If it's a non-success code, then Postgres will retry at various intervals. It is important to monitor for errors, because if it's silently failing the WAL segments will start to pile up causing disk space to become scarce.

Here is an example configuration:

# Turn on WAL archiving
archive_mode = on

archive_command = 'archive_command.sh "%f" "%p"'

%f is the filename and %p is the full path to the segment we want to save.

Here is an example of uploading the WAL to an S3 bucket:

#!/bin/bash

set -e

FILE="$1"
PATH="$2"
BUCKET="$AWS_S3_BUCKET"

# HEAD to see if file exists
aws s3api head-object --bucket "$BUCKET" --key "wal/$FILE" && \
   exit 0 || echo "Doesn't exist continuing..."

# IF NOT EXISTS, upload
aws s3 cp "$PATH" s3://"$BUCKET"/wal/"$FILE"

This make sure that every segment gets saved in an S3 bucket for safe usage later.

You will need to restart your Postgres server process for these settings to take affect, so try to find a time with low traffic.

| Triple check your S3 bucket is locked down tight. These WAL logs contain you and your users data, and could be stolen if publicly accessible

Restore

We will walk through a step-by-step recovery process in the next section but first it's important to add the restore_command configuration. This is the command that is called when Postgres is in recovery mode and needs to read WAL segments. It will call this command with the WAL segment it needs, similar to how it calls the archive_command with the segment it wants to archive.

Refreshingly, it's just the exact opposite of the archive_command.

restore_command = 'restore_command.sh "%f" "%p"'

and

set -e

FILE="$1"
PATH="$2"
BUCKET="$AWS_S3_BUCKET"

# HEAD to see if file exists
aws s3api head-object --bucket "$BUCKET" --key "wal/$FILE" && \
  # Exit with error if doesn't exist
  echo "Found it, continuing" || exit 1


# IF exists, download
aws s3 cp s3://"$BUCKET"/wal/"$FILE" "$PATH"

The important difference is the exit codes. During recovery Postgres will try to get segments that don't exist -- it's probing to see what data does or doesn't exist -- it's important to return an error if it doesn't exist. This will signal to Postgres how to proceed. If it exists, copy the file to the given path.

This will be used automatically by Postgres during recovery and should be tested by you before a disaster occurs.

Restoring the database

Okay you are safe now! You have your base backups and your WAL archives. Disasters roll off your back like water. But oh no! Someone deleted every user in the database at 16:45:03 PST. Let's get you back up and running.

Step 1 - Shutdown the db

No matter what you will need to shutdown to do the recovery, but it serves another purpose to start here -- no new data is lost. Your service being unavailable is better than data loss. As quickly as you can after detecting this kind of problem, shutdown Postgres.

Step 2 - Prepare the backup

  • Grab the latest base backup from your object storage, and copy it to your server.
  • Copy the current Postgres data directory to a temporary folder to re-use the WAL. It's possible that WAL segments had not been archived, so you can use those in your recovery process by copying them in later.
  • Now delete the data directory. It needs to be empty for your base backup.
  • Untar/gzip the backup into the data directory
  • Untar/gzip the pgwal section of the base backup in the data directory `pgwal` directory
  • Copy the wal segments from earlier into the pg_wal directory

Okay now you are ready to go!

Step 3 - Determine recovery time

We are about the put the "point-in-time" in "point-in-time recovery"! It's important you get the right time so you avoid the disastrous delete from before. The timestamp must be in the Postgres timestamp with time zone format. So if we wanted to recover to 16:45:00 PST it would look like this 2024-04-02 16:45:00-08. The preferred method is using UTC offset notation for time zones. -8 is PST. Change the date to your recovery date.

Then take that value and update the recovery_target_time configuration value in postgresql.conf.

Note, you will need to copy your postgresql.conf file back in after restoring the base backup. If not it will use the default Postgres configuration.

Step 4 - Start in recovery mode

Here we go! Time to restart Postgres! The only thing we need to do here is add a recovery.signal file to the data directory before restarting. That way Postgres knows it needs to enter recovery mode.

Postgres should be showing logs of it downloading WAL segments, and possibly error messages when it doesn't find some (that is to be expected).

Step 5 - Verify

By default Postgres will start in read-only mode so that you can verify the data is indeed intact. Do that using psql or your favorite db client. Once you have done that, delete the recovery.signal file and restart Postgres. You did it!

If there is some issues, try the same process again. If you are still running into issues, try an older base backup -- it may take longer but still should work. If that doesn't work, send me an email sam [at] devscoach.com and I will do my best to help you debug.

Step 6 - Cleanup

Whew! We made it through, but we aren't out of the woods yet. It's important to take a new base backup as soon as possible after a recovery. This makes sure that you have the record of the recovery and you don't have to go through that headache again.

Conclusion

With some proper planning and systems in place you can recover from any disaster with minimal stress. Postgres is an amazingly versatile tool, the more you learn, the more you can use it to its full potential.

Reference: Continuous Archiving

Having security or scaling issues? Let's chat.