Pivotal Knowledge Base

Follow

Incremental Backups in GreenPlum Database

 Environment

 Product  Version
 Pivotal Greenplum  4.3.x
 OS  RHEL 6.x

Introduction 

Greenplum (GPDB) backups can be of two types, Full or Incremental. This article is going to focus on only incremental backups, how it works, and it's limitations.

Why Incremental backups?

Incremental Backups only backup changed blocks to AO/AOCO tables. For large Enterprise Data Warehouses (EDW) this would be very beneficial as we can perform a time-consuming full backup just once a week and quick smaller incremental backups the rest of the week (6 days).

Synopsis

An incremental backup only backs up an append-optimized or column-oriented table if one of the following operations was performed on the table after the last full or incremental backup:

  • ALTER TABLE
  • DELETE
  • INSERT
  • TRUNCATE
  • UPDATE
  • DROP and then re-create the table

For partitioned append-optimized tables, only the changed partitions are backed up while incremental backups do not work with heap tables. In other words, if there is a heap table and only 10% of the rows have changed, the full heap table gets backed up.

For complete information on incremental backups, refer to this documentation.

Description

Below is a detailed description of how incremental backup works.

Suppose that we take full backups every Sunday followed by incremental backups for the remaining six days. If the database crash happens on a Wednesday, we would assume that we need to restore the full backups and the incremental backups for Monday and Tuesday. This would end up in duplicate rows in Greenplum. GPDB has the intelligence to know which incremental backups are linked to which full backups. So in our case, we only restore the incremental backup on Tuesday and it, in turn, would do the job and restore the database up to Tuesday's backups.

The current Greenplum incremental backups do not backup any tables that had no DML (INSERT/DELETE/UPDATE) operations on it since the last full backup. However, even if only 1 row of a very big AO table is INSERTED/DELETED/UPDATED, it would still backup the full table which can be very time-consuming.

Example:

  • Database test_incremental created for testing, 2 AO tables "Sales" & "Invoice" created:
gpadmin=# create database test_incremental;
CREATE DATABASE

gpadmin@gpdb-sandbox ~]$ psql test_incremental;
psql (8.2.15)
Type "help" for help.

test_incremental=# create table sales (txn_id int) WITH (appendonly=true, compresslevel=5) DISTRIBUTED BY (txn_id);
CREATE TABLE

test_incremental=# insert into sales select (generate_series(1,10));
INSERT 0 10

test_incremental=# create table invoice (inv_id int) WITH (appendonly=true, compresslevel=5) DISTRIBUTED BY (txn_id);
CREATE TABLE

test_incremental=# insert into invoice select (generate_series(100,110));
INSERT 0 11
  • Full backup of test_incremental started:
[gpadmin@gpdb-sandbox ~]$ gpcrondump -x test_incremental_backup --prefix user_backup
20161101:06:16:54:067811 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting gpcrondump with args: -x test_incremental --prefix user_backup
20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:---------------------------------------------------- 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Target database                          = test_incremental 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump subdirectory                        = 20161229 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump type                                = Full database 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Clear old dump directories               = Off 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump start time                          = 17:13:57 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump end time                            = 17:14:02 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Status                                   = COMPLETED 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump key                                 = 20161229171357 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump file compression                    = On 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Vacuum mode type                         = Off 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Exit code zero, no warnings generated 20161229:17:14:02:198270 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------  
  • Ten more rows inserted into only the sales AO table:
test_incremental-# insert into sales select (generate_series(11,20));
INSERT 0 10
  • Incremental Backup of test_incremental started:
[gpadmin@gpdb-sandbox ~]$ gpcrondump -x test_incremental --incremental --prefix user_backup
20161101:06:18:42:068725 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Starting gpcrondump with args: -x test_incremental --incremental --prefix user_backup 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:---------------------------------------------------- 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Target database                          = test_incremental 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump subdirectory                        = 20161229 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump type                                = Incremental 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Clear old dump directories               = Off 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump start time                          = 17:18:11 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump end time                            = 17:18:21 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Status                                   = COMPLETED 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump key                                 = 20161229171811 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Dump file compression                    = On 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Vacuum mode type                         = Off 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:-Exit code zero, no warnings generated 20161229:17:18:21:199103 gpcrondump:gpdb-sandbox:gpadmin-[INFO]:---------------------------------------------------- 
  • New database test_incremental_restore to restore created:
gpadmin=# create database test_incremental_restore;
CREATE DATABASE
test_incremental=# create database test_incremental_restore;
CREATE DATABASE
  • Full backup first restored (Timestamp key 20161229171357):
[gpadmin@gpdb-sandbox ~]$ gpdbrestore -t 20161229171357 --prefix user_backup 
gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Starting gpdbrestore with args: -t 20161229171357 --prefix user_backup
[gpadmin@gpdb-sandbox ~]$ psql test_incremental_restore
psql (8.2.15)
Type "help" for help. test_incremental_restore=# select count(*) from sales;
count
-------
10
(1 row) test_incremental_restore=# select count(*) from invoice;
count
-------
11
(1 row)
  • Incremental backup then restored (Timestamp key 20161229171811):

[gpadmin@gpdb-sandbox ~]$ gpdbrestore -t 20161229171811 --prefix user_backup
20161101:06:24:47:070488 gpdbrestore:gpdb-sandbox:gpadmin-[INFO]:-Starting gpdbrestore with args: -t 20161229171811 --prefix user_backup [gpadmin@gpdb-sandbox ~]$ psql incremental_backup
psql (8.2.15)
Type "help" for help. incremental_backup=# select count(*) from sales;
count
-------
30
(1 row) incremental_backup=# select count(*) from invoice;
count
-------
22
(1 row)
  • As we can see, the incremental backup when restored also restores rows of the full backup. In our example, since we restored both full and incremental backups, we ended up with duplicate rows.

The correct way is to only restore the last incremental backup set. In our case, we should use the timestamp 20161229171811. The thing we see is that even though the sales table had only ten additional rows, the incremental backup went and backed up the entire table (20 rows). If the sales table could only backup the ten new rows inserted, it would greatly reduce the time of incremental backups.

Additional Information

Here are some restrictions of incremental backups:

  • It does not work for Heap Tables, so if the application uses primarily heap tables, then incremental backups would not be a big gain since incremental backups take a full backup of heap tables.
  • Incremental backups are not supported with Data Domain Boost.
  • Dump data as INSERT instead of COPY is not supported. Click here to read more about INSERT & COPY.
  • gpcrondump does not check for available disk space before performing an incremental backup.
  • Cannot include OID's when using incremental backups. This would be very useful when we have to manually use the COPY commands to populate a database.
  • Table level and schema level backups are not supported.

For a complete list of options and restrictions, see gpcrondump command Documentation.

 

Comments

Powered by Zendesk