Reading Time: 3 minutes
This quick guide won’t replace the documentation. It provides an example of PostgreSQL backup, PITR configuration & restore.
Backup configuration
This is not intended to supersede any backup configuration manual, but before recovery, let’s check how the backup is running.
The /mnt/cephfs is a shared filesystem on which I place the backups and the archivelogs. Archivelogs are saved after each WAL rotation. Let’s review the config.
Full Backup
A full backup is triggered every 2 weeks via the postgres
crontab. ( crontab -e
as the postgres
user)
# m h dom mon dow command
0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete
# m h dom mon dow command
0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete
# m h dom mon dow command
0 4 1,15 * * /var/lib/postgresql/bin/backup_database.sh && find /mnt/cephfs/backup/postgresql/archivelogs -mtime +30 -delete
$ cat /var/lib/postgresql/bin/backup_database.sh
DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M )
pg_basebackup -z -Z8 -Ft -D "$DIR"
$ cat /var/lib/postgresql/bin/backup_database.sh
#!/bin/bash
DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M )
mkdir "$DIR"
pg_basebackup -z -Z8 -Ft -D "$DIR"
$ cat /var/lib/postgresql/bin/backup_database.sh
#!/bin/bash
DIR=/mnt/cephfs/backup/postgresql/$( date +%F_%Hh%M )
mkdir "$DIR"
pg_basebackup -z -Z8 -Ft -D "$DIR"
WAL backup
postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf
archive_mode = on # enables archiving; off, on, or always
archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'
postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf
archive_mode = on # enables archiving; off, on, or always
archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'
postgres@minetest:~$ grep ^archive /etc/postgresql/14/main/postgresql.conf
archive_mode = on # enables archiving; off, on, or always
archive_command = 'nice xz -1 -zc %p > /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && nice xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz'
Restore
Check that you have the restore_command
aligned with your archive_command
(location, compression, etc…), and set the recovery_target_time
. The stop point must be after the ending time of the base backup.
You can see here a simple one line command that
- checks for the compressed
WAL / archivelog
integrity (-t
)
- decompresses the archive and ‘cat’ the output to stdout (`xz -dc`)
- sends the decompressed output to the
WAL
file path to restore to (> %p
)
postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p'
recovery_target_time = '2023-04-08 16:00:00'
postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p'
recovery_target_time = '2023-04-08 16:00:00'
postgres@minetest:~$ grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = 'xz -t /mnt/cephfs/backup/postgresql/archivelogs/%f.xz && xz -dc /mnt/cephfs/backup/postgresql/archivelogs/%f.xz > %p'
recovery_target_time = '2023-04-08 16:00:00'
After checking the postgresql.conf
Stop the DB
For Debian/Ubuntu:
as root
systemctl stop postgresql
systemctl stop postgresql
systemctl stop postgresql
Delete the DB data & restore full backup
Make a copy if you have enough space.
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup
mkdir /var/lib/postgresql/14/main
tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/
chmod 700 /var/lib/postgresql/14/main
touch /var/lib/postgresql/14/main/recovery.signal
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup
mkdir /var/lib/postgresql/14/main
tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/
chmod 700 /var/lib/postgresql/14/main
touch /var/lib/postgresql/14/main/recovery.signal
mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_original_backup
mkdir /var/lib/postgresql/14/main
tar -zxvf /mnt/cephfs/backup/postgresql/2023-04-01_04h00/base.tar.gz -C /var/lib/postgresql/14/main/
chmod 700 /var/lib/postgresql/14/main
touch /var/lib/postgresql/14/main/recovery.signal
The presence of the recovery.signal
file will trigger the recovery process on the next start.
Start the DB and monitor the Point-in-time recovery
- (as root)
systemctl start postgresql
- Watch the process progress with the log :
tail -f /var/log/postgresql/postgresql-14-main.log
2023-04-09 11:00:31.389 UTC [10090] LOG: starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv6 address "::1", port 5432
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-04-09 11:00:31.452 UTC [10090] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-09 11:00:31.533 UTC [10091] LOG: database system was interrupted; last known up at 2023-04-01 04:06:15 UTC
xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory
2023-04-09 11:00:31.594 UTC [10091] LOG: starting point-in-time recovery to 2023-04-08 16:00:00+00
2023-04-09 11:00:32.422 UTC [10091] LOG: restored log file "00000001000000A400000057" from archive
2023-04-09 11:00:32.602 UTC [10091] LOG: redo starts at A4/57000028
2023-04-09 11:00:32.663 UTC [10091] LOG: consistent recovery state reached at A4/577AD5A8
2023-04-09 11:00:32.664 UTC [10090] LOG: database system is ready to accept read-only connections
2023-04-09 11:00:33.814 UTC [10091] LOG: restored log file "00000001000000A400000058" from archive
2023-04-09 11:00:36.060 UTC [10091] LOG: restored log file "00000001000000A400000059" from archive
2023-04-09 11:00:37.675 UTC [10091] LOG: restored log file "00000001000000A40000005A" from archive
... [Deleted hundreds of lines]
2023-04-09 11:43:38.906 UTC [10091] LOG: restored log file "00000001000000A7000000B9" from archive
2023-04-09 11:43:40.502 UTC [10091] LOG: restored log file "00000001000000A7000000BA" from archive
2023-04-09 11:43:41.949 UTC [10091] LOG: restored log file "00000001000000A7000000BB" from archive
2023-04-09 11:43:44.134 UTC [10091] LOG: restored log file "00000001000000A7000000BC" from archive
2023-04-09 11:43:45.965 UTC [10091] LOG: restored log file "00000001000000A7000000BD" from archive
2023-04-09 11:43:47.774 UTC [10091] LOG: restored log file "00000001000000A7000000BE" from archive
2023-04-09 11:43:49.698 UTC [10091] LOG: restored log file "00000001000000A7000000BF" from archive
2023-04-09 11:43:50.194 UTC [10091] LOG: recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00
2023-04-09 11:43:50.194 UTC [10091] LOG: pausing at the end of recovery
2023-04-09 11:43:50.194 UTC [10091] HINT: Execute pg_wal_replay_resume() to promote.
2023-04-09 11:00:31.389 UTC [10090] LOG: starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv6 address "::1", port 5432
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-04-09 11:00:31.452 UTC [10090] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-09 11:00:31.533 UTC [10091] LOG: database system was interrupted; last known up at 2023-04-01 04:06:15 UTC
xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory
2023-04-09 11:00:31.594 UTC [10091] LOG: starting point-in-time recovery to 2023-04-08 16:00:00+00
2023-04-09 11:00:32.422 UTC [10091] LOG: restored log file "00000001000000A400000057" from archive
2023-04-09 11:00:32.602 UTC [10091] LOG: redo starts at A4/57000028
2023-04-09 11:00:32.663 UTC [10091] LOG: consistent recovery state reached at A4/577AD5A8
2023-04-09 11:00:32.664 UTC [10090] LOG: database system is ready to accept read-only connections
2023-04-09 11:00:33.814 UTC [10091] LOG: restored log file "00000001000000A400000058" from archive
2023-04-09 11:00:36.060 UTC [10091] LOG: restored log file "00000001000000A400000059" from archive
2023-04-09 11:00:37.675 UTC [10091] LOG: restored log file "00000001000000A40000005A" from archive
... [Deleted hundreds of lines]
2023-04-09 11:43:38.906 UTC [10091] LOG: restored log file "00000001000000A7000000B9" from archive
2023-04-09 11:43:40.502 UTC [10091] LOG: restored log file "00000001000000A7000000BA" from archive
2023-04-09 11:43:41.949 UTC [10091] LOG: restored log file "00000001000000A7000000BB" from archive
2023-04-09 11:43:44.134 UTC [10091] LOG: restored log file "00000001000000A7000000BC" from archive
2023-04-09 11:43:45.965 UTC [10091] LOG: restored log file "00000001000000A7000000BD" from archive
2023-04-09 11:43:47.774 UTC [10091] LOG: restored log file "00000001000000A7000000BE" from archive
2023-04-09 11:43:49.698 UTC [10091] LOG: restored log file "00000001000000A7000000BF" from archive
2023-04-09 11:43:50.194 UTC [10091] LOG: recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00
2023-04-09 11:43:50.194 UTC [10091] LOG: pausing at the end of recovery
2023-04-09 11:43:50.194 UTC [10091] HINT: Execute pg_wal_replay_resume() to promote.
2023-04-09 11:00:31.389 UTC [10090] LOG: starting PostgreSQL 14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv6 address "::1", port 5432
2023-04-09 11:00:31.390 UTC [10090] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-04-09 11:00:31.452 UTC [10090] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-09 11:00:31.533 UTC [10091] LOG: database system was interrupted; last known up at 2023-04-01 04:06:15 UTC
xz: /mnt/cephfs/backup/postgresql/archivelogs/00000002.history.xz: No such file or directory
2023-04-09 11:00:31.594 UTC [10091] LOG: starting point-in-time recovery to 2023-04-08 16:00:00+00
2023-04-09 11:00:32.422 UTC [10091] LOG: restored log file "00000001000000A400000057" from archive
2023-04-09 11:00:32.602 UTC [10091] LOG: redo starts at A4/57000028
2023-04-09 11:00:32.663 UTC [10091] LOG: consistent recovery state reached at A4/577AD5A8
2023-04-09 11:00:32.664 UTC [10090] LOG: database system is ready to accept read-only connections
2023-04-09 11:00:33.814 UTC [10091] LOG: restored log file "00000001000000A400000058" from archive
2023-04-09 11:00:36.060 UTC [10091] LOG: restored log file "00000001000000A400000059" from archive
2023-04-09 11:00:37.675 UTC [10091] LOG: restored log file "00000001000000A40000005A" from archive
... [Deleted hundreds of lines]
2023-04-09 11:43:38.906 UTC [10091] LOG: restored log file "00000001000000A7000000B9" from archive
2023-04-09 11:43:40.502 UTC [10091] LOG: restored log file "00000001000000A7000000BA" from archive
2023-04-09 11:43:41.949 UTC [10091] LOG: restored log file "00000001000000A7000000BB" from archive
2023-04-09 11:43:44.134 UTC [10091] LOG: restored log file "00000001000000A7000000BC" from archive
2023-04-09 11:43:45.965 UTC [10091] LOG: restored log file "00000001000000A7000000BD" from archive
2023-04-09 11:43:47.774 UTC [10091] LOG: restored log file "00000001000000A7000000BE" from archive
2023-04-09 11:43:49.698 UTC [10091] LOG: restored log file "00000001000000A7000000BF" from archive
2023-04-09 11:43:50.194 UTC [10091] LOG: recovery stopping before commit of transaction 6522027, time 2023-04-08 16:00:00.391957+00
2023-04-09 11:43:50.194 UTC [10091] LOG: pausing at the end of recovery
2023-04-09 11:43:50.194 UTC [10091] HINT: Execute pg_wal_replay_resume() to promote.
Open the database
postgres@minetest:~/14/main$ psql
psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1))
postgres=# SELECT pg_wal_replay_resume();
postgres=# SELECT pg_current_wal_lsn();
postgres@minetest:~/14/main$
postgres@minetest:~/14/main$ ls -l recovery.signal
ls: cannot access 'recovery.signal': No such file or directory
postgres@minetest:~/14/main$ psql
psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# SELECT pg_wal_replay_resume();
pg_wal_replay_resume
----------------------
(1 row)
postgres=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
A7/BFE63C48
(1 row)
postgres=# \q
postgres@minetest:~/14/main$
postgres@minetest:~/14/main$ ls -l recovery.signal
ls: cannot access 'recovery.signal': No such file or directory
postgres@minetest:~/14/main$ psql
psql (14.7 (Ubuntu 14.7-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# SELECT pg_wal_replay_resume();
pg_wal_replay_resume
----------------------
(1 row)
postgres=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
A7/BFE63C48
(1 row)
postgres=# \q
postgres@minetest:~/14/main$
postgres@minetest:~/14/main$ ls -l recovery.signal
ls: cannot access 'recovery.signal': No such file or directory
We can see the the process completely finished, and after resuming the pg_wal_replay
, the recovery.signal
file has been removed.
Conclusion
- PostgreSQL PITR is very simple, and can be triggered easily anytime you need it
pg_basebackup
is your friend
- Always test recovery after configuring PITR. This will ensure you have the correct
restore_command
, and ensure you can quickly proceed to a PITR
- It’s possible to use a dedicated recovery.conf and start the DB
2 responses to “Point-in-time recovery (PITR) of PostgreSQL database”
I followed these steps, but still getting below error
2023-12-19 11:03:53.488 UTC [13179] LOG: invalid checkpoint record
2023-12-19 11:03:53.488 UTC [13179] FATAL: could not locate required checkpoint record
2023-12-19 11:03:53.488 UTC [13179] HINT: If you are restoring from a backup, touch “/data/databases/pgsql14/main/recovery.signal” and add required recovery options.
If you are not restoring from a backup, try removing the file “/data/databases/pgsql14/main/backup_label”.
Be careful: removing “/data/databases/pgsql14/main/backup_label” will result in a corrupt cluster if restoring from a backup
Hi Jeevan,
Interesting case. Can you:
– ls -lh the recovery.signal file ?
– Did you try to restore to an earlier date (Check a couple of previous archived logs to pick a date) ? You might have to pg_resetwal, since the current WALs are of no use if you restore to an earlier date. Possibly choose a date before the last full backup (if you have the previous & WALs archives.)
– Anyway, make sure you have a file backup of the existing situation so you can always undo what you’re doing (tar of the data folder)
– You mentioned a corrupt cluster. Is the DB part of a cluster with replication ? How did you get there ? Unsafe power off ?
Keep us posted,
Keep me posted.