Pivert's Blog

Point-in-time recovery (PITR) of PostgreSQL database


,
Reading Time: 3 minutes

This quick guide won’t replace the documentation. It provides an example of PostgreSQL backup, PITR configuration & restore.

Backup configuration

This is not intended to supersede any backup configuration manual, but before recovery, let’s check how the backup is running.

The /mnt/cephfs is a shared filesystem on which I place the backups and the archivelogs. Archivelogs are saved after each WAL rotation. Let’s review the config.

Full Backup

A full backup is triggered every 2 weeks via the postgres crontab. ( crontab -e as the postgres user)

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# m h dom mon dow command
0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete
# m h dom mon dow command 0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete
# m h  dom mon dow   command
0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
$ cat /var/lib/postgresql/bin/backup_database.sh
#!/bin/bash
DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M )
mkdir "$DIR"
pg_basebackup -z -Z8 -Ft -D "$DIR"
$ cat /var/lib/postgresql/bin/backup_database.sh #!/bin/bash DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M ) mkdir "$DIR" pg_basebackup -z -Z8 -Ft -D "$DIR"
$ cat /var/lib/postgresql/bin/backup_database.sh
#!/bin/bash

DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M )
mkdir "$DIR"
pg_basebackup -z -Z8 -Ft -D "$DIR"

WAL backup

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf
archive_mode = on # enables archiving; off, on, or always
archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'
postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf archive_mode = on # enables archiving; off, on, or always archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'
postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf
archive_mode = on               # enables archiving; off, on, or always
archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz  && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'

Restore

Check that you have the restore_command aligned with your archive_command (location, compression, etc…), and set the recovery_target_time. The stop point must be after the ending time of the base backup.

You can see here a simple one line command that

  • checks for the compressed WAL / archivelog integrity (-t)
  • decompresses the archive and ‘cat’ the output to stdout (`xz -dc`)
  • sends the decompressed output to the WAL file path to restore to (> %p)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p'
recovery_target_time = '2023-04-08 16:00:00'
postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p' recovery_target_time = '2023-04-08 16:00:00'
postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p'
recovery_target_time = '2023-04-08 16:00:00'

After checking the postgresql.conf

Stop the DB

For Debian/Ubuntu:
as root

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
systemctl stop postgresql
systemctl stop postgresql
systemctl stop postgresql

Delete the DB data & restore full backup

Make a copy if you have enough space.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup
mkdir /var/lib/postgresql/14/main
tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/
chmod 700 /var/lib/postgresql/14/main
touch /var/lib/postgresql/14/main/recovery.signal
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup mkdir /var/lib/postgresql/14/main tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/ chmod 700 /var/lib/postgresql/14/main touch /var/lib/postgresql/14/main/recovery.signal
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup
mkdir /var/lib/postgresql/14/main
tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/
chmod 700 /var/lib/postgresql/14/main
touch /var/lib/postgresql/14/main/recovery.signal

The presence of the recovery.signal file will trigger the recovery process on the next start.

Start the DB and monitor the Point-in-time recovery

  • (as root) systemctl start postgresql
  • Watch the process progress with the log : tail -f /var/log/postgresql/postgresql-14-main.log
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
2023-04-09 11:00:31.389 UTC [10090] LOG: starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv6 address "::1", port 5432
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-04-09 11:00:31.452 UTC [10090] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-09 11:00:31.533 UTC [10091] LOG: database system was interrupted; last known up at 2023-04-01 04:06:15 UTC
xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory
2023-04-09 11:00:31.594 UTC [10091] LOG: starting point-in-time recovery to 2023-04-08 16:00:00+00
2023-04-09 11:00:32.422 UTC [10091] LOG: restored log file "00000001000000A400000057" from archive
2023-04-09 11:00:32.602 UTC [10091] LOG: redo starts at A4/57000028
2023-04-09 11:00:32.663 UTC [10091] LOG: consistent recovery state reached at A4/577AD5A8
2023-04-09 11:00:32.664 UTC [10090] LOG: database system is ready to accept read-only connections
2023-04-09 11:00:33.814 UTC [10091] LOG: restored log file "00000001000000A400000058" from archive
2023-04-09 11:00:36.060 UTC [10091] LOG: restored log file "00000001000000A400000059" from archive
2023-04-09 11:00:37.675 UTC [10091] LOG: restored log file "00000001000000A40000005A" from archive
... [Deleted hundreds of lines]
2023-04-09 11:43:38.906 UTC [10091] LOG: restored log file "00000001000000A7000000B9" from archive
2023-04-09 11:43:40.502 UTC [10091] LOG: restored log file "00000001000000A7000000BA" from archive
2023-04-09 11:43:41.949 UTC [10091] LOG: restored log file "00000001000000A7000000BB" from archive
2023-04-09 11:43:44.134 UTC [10091] LOG: restored log file "00000001000000A7000000BC" from archive
2023-04-09 11:43:45.965 UTC [10091] LOG: restored log file "00000001000000A7000000BD" from archive
2023-04-09 11:43:47.774 UTC [10091] LOG: restored log file "00000001000000A7000000BE" from archive
2023-04-09 11:43:49.698 UTC [10091] LOG: restored log file "00000001000000A7000000BF" from archive
2023-04-09 11:43:50.194 UTC [10091] LOG: recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00
2023-04-09 11:43:50.194 UTC [10091] LOG: pausing at the end of recovery
2023-04-09 11:43:50.194 UTC [10091] HINT: Execute pg_wal_replay_resume() to promote.
2023-04-09 11:00:31.389 UTC [10090] LOG: starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit 2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv6 address "::1", port 5432 2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-04-09 11:00:31.452 UTC [10090] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2023-04-09 11:00:31.533 UTC [10091] LOG: database system was interrupted; last known up at 2023-04-01 04:06:15 UTC xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory 2023-04-09 11:00:31.594 UTC [10091] LOG: starting point-in-time recovery to 2023-04-08 16:00:00+00 2023-04-09 11:00:32.422 UTC [10091] LOG: restored log file "00000001000000A400000057" from archive 2023-04-09 11:00:32.602 UTC [10091] LOG: redo starts at A4/57000028 2023-04-09 11:00:32.663 UTC [10091] LOG: consistent recovery state reached at A4/577AD5A8 2023-04-09 11:00:32.664 UTC [10090] LOG: database system is ready to accept read-only connections 2023-04-09 11:00:33.814 UTC [10091] LOG: restored log file "00000001000000A400000058" from archive 2023-04-09 11:00:36.060 UTC [10091] LOG: restored log file "00000001000000A400000059" from archive 2023-04-09 11:00:37.675 UTC [10091] LOG: restored log file "00000001000000A40000005A" from archive ... [Deleted hundreds of lines] 2023-04-09 11:43:38.906 UTC [10091] LOG: restored log file "00000001000000A7000000B9" from archive 2023-04-09 11:43:40.502 UTC [10091] LOG: restored log file "00000001000000A7000000BA" from archive 2023-04-09 11:43:41.949 UTC [10091] LOG: restored log file "00000001000000A7000000BB" from archive 2023-04-09 11:43:44.134 UTC [10091] LOG: restored log file "00000001000000A7000000BC" from archive 2023-04-09 11:43:45.965 UTC [10091] LOG: restored log file "00000001000000A7000000BD" from archive 2023-04-09 11:43:47.774 UTC [10091] LOG: restored log file "00000001000000A7000000BE" from archive 2023-04-09 11:43:49.698 UTC [10091] LOG: restored log file "00000001000000A7000000BF" from archive 2023-04-09 11:43:50.194 UTC [10091] LOG: recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00 2023-04-09 11:43:50.194 UTC [10091] LOG: pausing at the end of recovery 2023-04-09 11:43:50.194 UTC [10091] HINT: Execute pg_wal_replay_resume() to promote.
2023-04-09 11:00:31.389 UTC [10090] LOG:  starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-09 11:00:31.390 UTC [10090] LOG:  listening on IPv6 address "::1", port 5432
2023-04-09 11:00:31.390 UTC [10090] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-04-09 11:00:31.452 UTC [10090] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-09 11:00:31.533 UTC [10091] LOG:  database system was interrupted; last known up at 2023-04-01 04:06:15 UTC
xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory
2023-04-09 11:00:31.594 UTC [10091] LOG:  starting point-in-time recovery to 2023-04-08 16:00:00+00
2023-04-09 11:00:32.422 UTC [10091] LOG:  restored log file "00000001000000A400000057" from archive
2023-04-09 11:00:32.602 UTC [10091] LOG:  redo starts at A4/57000028
2023-04-09 11:00:32.663 UTC [10091] LOG:  consistent recovery state reached at A4/577AD5A8
2023-04-09 11:00:32.664 UTC [10090] LOG:  database system is ready to accept read-only connections
2023-04-09 11:00:33.814 UTC [10091] LOG:  restored log file "00000001000000A400000058" from archive
2023-04-09 11:00:36.060 UTC [10091] LOG:  restored log file "00000001000000A400000059" from archive
2023-04-09 11:00:37.675 UTC [10091] LOG:  restored log file "00000001000000A40000005A" from archive
... [Deleted hundreds of lines]
2023-04-09 11:43:38.906 UTC [10091] LOG:  restored log file "00000001000000A7000000B9" from archive
2023-04-09 11:43:40.502 UTC [10091] LOG:  restored log file "00000001000000A7000000BA" from archive
2023-04-09 11:43:41.949 UTC [10091] LOG:  restored log file "00000001000000A7000000BB" from archive
2023-04-09 11:43:44.134 UTC [10091] LOG:  restored log file "00000001000000A7000000BC" from archive
2023-04-09 11:43:45.965 UTC [10091] LOG:  restored log file "00000001000000A7000000BD" from archive
2023-04-09 11:43:47.774 UTC [10091] LOG:  restored log file "00000001000000A7000000BE" from archive
2023-04-09 11:43:49.698 UTC [10091] LOG:  restored log file "00000001000000A7000000BF" from archive
2023-04-09 11:43:50.194 UTC [10091] LOG:  recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00
2023-04-09 11:43:50.194 UTC [10091] LOG:  pausing at the end of recovery
2023-04-09 11:43:50.194 UTC [10091] HINT:  Execute pg_wal_replay_resume() to promote.

Open the database

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
postgres@minetest:~/14/main$ psql
psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# SELECT pg_wal_replay_resume();
pg_wal_replay_resume
----------------------
(1 row)
postgres=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
A7/BFE63C48
(1 row)
postgres=# \q
postgres@minetest:~/14/main$
postgres@minetest:~/14/main$ ls -l recovery.signal
ls: cannot access 'recovery.signal': No such file or directory
postgres@minetest:~/14/main$ psql psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1)) Type "help" for help. postgres=# SELECT pg_wal_replay_resume(); pg_wal_replay_resume ---------------------- (1 row) postgres=# SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- A7/BFE63C48 (1 row) postgres=# \q postgres@minetest:~/14/main$ postgres@minetest:~/14/main$ ls -l recovery.signal ls: cannot access 'recovery.signal': No such file or directory
postgres@minetest:~/14/main$ psql
psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# SELECT pg_wal_replay_resume();
 pg_wal_replay_resume 
----------------------
 
(1 row)

postgres=# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 A7/BFE63C48
(1 row)


postgres=# \q
postgres@minetest:~/14/main$ 
postgres@minetest:~/14/main$ ls -l recovery.signal
ls: cannot access 'recovery.signal': No such file or directory

We can see the the process completely finished, and after resuming the pg_wal_replay, the recovery.signal file has been removed.

Conclusion

  • PostgreSQL PITR is very simple, and can be triggered easily anytime you need it
  • pg_basebackup is your friend
  • Always test recovery after configuring PITR. This will ensure you have the correct restore_command, and ensure you can quickly proceed to a PITR
  • It’s possible to use a dedicated recovery.conf and start the DB

Like it ?

Get notified on new posts (max 1 / month)
Soyez informés lors des prochains articles

2 responses to “Point-in-time recovery (PITR) of PostgreSQL database”

  1. I followed these steps, but still getting below error

    2023-12-19 11:03:53.488 UTC [13179] LOG: invalid checkpoint record
    2023-12-19 11:03:53.488 UTC [13179] FATAL: could not locate required checkpoint record
    2023-12-19 11:03:53.488 UTC [13179] HINT: If you are restoring from a backup, touch “/data/databases/pgsql14/main/recovery.signal” and add required recovery options.
    If you are not restoring from a backup, try removing the file “/data/databases/pgsql14/main/backup_label”.
    Be careful: removing “/data/databases/pgsql14/main/backup_label” will result in a corrupt cluster if restoring from a backup

    • Hi Jeevan,
      Interesting case. Can you:
      – ls -lh the recovery.signal file ?
      – Did you try to restore to an earlier date (Check a couple of previous archived logs to pick a date) ? You might have to pg_resetwal, since the current WALs are of no use if you restore to an earlier date. Possibly choose a date before the last full backup (if you have the previous & WALs archives.)
      – Anyway, make sure you have a file backup of the existing situation so you can always undo what you’re doing (tar of the data folder)
      – You mentioned a corrupt cluster. Is the DB part of a cluster with replication ? How did you get there ? Unsafe power off ?

      Keep us posted,
      Keep me posted.

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-Robot Verification
FriendlyCaptcha ⇗