Master mirroring is one import component in Pivotal Greenplum database, following questions will covered in this article:
- What is master mirroring
- How GPDB 4.2 Master Mirroring Works
- How GPDB 4.3 Master Mirroring Works
1. What is master mirroring
Master Mirroring in GPDB is used to replicate the “catalog” metadata from one server to a standby server in order to ensure a second copy of the catalog is available and the system can be resumed upon failure of a GPDB master physical server.
2. How GPDB 4.2 Master Mirroring Works
The baseline best practice is for the customer to detect failure of the master server and then do a hard system reboot of the master utilizing a system specific command such as ipmitool to ensure that no remaining processes are running on the original master.
After ensuring the original master is rebooted, the customer runs gpactivatestandby in order bring up the standby master and make it start operating as a primary master server.
The mechanism of the mirroring to the standby master is to use Activestandby log shipping, where log records are sent to the standby master and the standby sends a reply confirming the log record has been shipped.
There are two primary worker processes on the standby master, both of which are started by the postgres postmaster. The gpsynchagent communicates with WAL Send Server to receive continuous transaction logs. The WAL Redo Server performs redo recovery using transaction logs.
The monitoring queries shown in the last section should be executed periodically on the master, and if the standby master is not showing Synchronized state, then the standby master should be reinitialized.
3. How GPDB 4.3 Master Mirroring Works
Master mirroring in the 4.3 version of GPDB has been redesigned from the 4.2 version of GPDB’s master mirroring technology. The new method described in this section is a WAL based Streaming Replication (WSR) technology, where WAL stands for writeahead
logging. The WSR technology is heavily based on Postgres open source technology for streaming replication described here: http://www.postgresql.org/docs/9.2/static/warmstandby.html .
One of the primary motivating factors in creating the new master mirroring technology was to provide a way to initialize the standby master and activate the standby master without restarting the database.
The primary master continues to provide service to users while a transactional snapshot of the primary master instance is taken. While the transactional snapshot is taken and deployed on the standby master, changes to the primary master are also recorded. After the snapshot is deployed on the standby master, the updates are deployed to synchronize the standby master with the primary master. Once the primary master and standby master are synchronized, the standby master is kept up to date by the walsender and walreceiver replication processes. The walreceiver is a standby master process. The walsender process is a primary master process. The two processes use WAL based streaming replication to keep the primary and standby masters synchronized. Upon activation of the standby master, the replicated logs reconstruct the state of the primary master at the time of the last successfully committed transaction.
Key components of the master mirroring architecture include:
- WAL: Transaction log used as the basis of this technology;
- pg_basebackup: utility used internally to request base image of the master;
- WAL Sender Process: Master process that either streams an online database snapshot or transport transaction log from primary to secondary segment on demand;
- WAL Receiver Process: Standby master process that consumes the stream of transaction log and puts it to right location on secondary segment location on demand;
- Startup Process: Process that consumes transaction log and replays them to apply changes to the data on disk.
The following is a more in depth flow of how the process works. A base image of the primary side database instance is required as the first step to commence WSR. This image helps to create the foundation for the mirror on which changes can be later applied using WAL. The primary continues to provide service to users while a transactional snapshot of the database instance is taken. This doesn’t guarantee a clean and a consistent base image of the database system instance. But, the WAL collected during this online operation helps to eventually make the base image consistent or in sync with the primary side. The base image consists of both the actual data (in form the of directories and files) and as well as WAL (Xlog seg files) generated before or during basebackup process.
Once the Master obtains the request, the postmaster spawns a new child to send the initial image of the master. This new child is the WAL sender process. The WAL sender process can take multiple roles based on the request that comes in. During basebackup phase, it performs online database snapshot and then streams the database image to the standby master. When the standby master receives the base backup, it performs all the operations like a database system instance would do until the first pass of xlog in ‘startup’ (xlog redo) process. The startup process finds that the system is requested it to be in a standby mode. With this trigger and the primary side connection info,
the startup process transitions into ‘Standby’ mode.
In the standby mode, the startup process starts using the xlog from the base image and applies them. Once that exhausts, the standby postmaster spawns a new child process, the WAL receiver. The WAL receiver tries to connect to the Master (primary). On this request arrival the Master spawns the WAL sender process again. But note that, the WAL sender takes a new role this time, i.e. sending only the newly generated xlog from the Master side to the standby side. In a nutshell, the standby tries to catchup with the Master by requesting changes (in form of xlog) that have not been received by the standby.
The WAL sender starts reading xlog from a location as indicated by the WAL receiver and later streams the newly generated xlog to the WAL receiver indefinitely until otherwise said to stop. As the WAL receiver starts receiving chunks of xlog from the sender, it writes them to the disk and periodically flushes them to the standby disk at the right location. The metadata in the sent chunks helps the receiver to know the exact content. The WAL receiver signals the standby startup process as new xlog is flushed to disk. This makes the standby aware of the new changes and read the newly arrived xlog as indicated by the receiver and apply them on the base image. Again, if the xlog to be read is exhausted the startup process goes into a dormant stage until the next WAL receiver trigger comes in.
This WAL replication scheme is completely synchronous which means that once the standby side completely catches up with the Master (the last good xlog location on both sides is identical), the user query backends at the Master side will wait until every xlog record up to their commit record end location is flushed on the standby side. This provides strong consistency and helps to ensure consistency in case of failovers. This process of transporting xlog is indefinite until a GPDB cluster shutdown is requested or a failover happens.
In a scenario of the Master failure, an external utility is used to trigger the standby promotion. On promote, the standby system goes from startup mode to production mode. During this it performs all the changes up to the last record in the xlog if they have not already been applied. The WAL receiver dies and is not spawned again. The system performs all startup phase passes and transitions to production that can start
serving to user queries again.