FTS is very import component in Pivotal Greenplum (GPDB).
The following topics will covered in this article:
- What is FTS
- How the GPDB Segment Fault Prober Works
- How to bring up failed mirror segments
- Explanation for gp_segment_configuration
- Why mirror being marked down
- FTS related parameters
1. What is FTS ?
On the GPDB master host there is a fault prober process that is forked and monitored from the postmaster postgres process. This fault prober process is also called the FTS (fault tolerance server) process. This process is restarted by the postmaster if it fails.
2. How the GPDB Segment Fault Prober Works ?
The FTS is in a continual loop, with sleep between each loop based on the settings below:
In each loop the FTS will "probe" each primary segment database by making a TCP socket connection to the segment database using the hostname and port registered in gp_segment_configuration table. If the connection can not be made or if a reply is not received from a connection in the time out period then a retry is attempted to that segment database. The number of retries is controlled by a setting.
When a segment is probed by FTS, it will report on a few simple checks such as stat system call on the critical segment directories and checking for internal faults. If there are no issues a positive reply is sent to the FTS server and no action is taken for that segment database. In the event of the max number of failed probe attempts, the FTS server will probe the mirror to ensure it is up and then update the gp_segment_configuration table marking the segment as down and transitioning the mirror to be the primary. The FTS server will also update the gp_configuration_history table with the operations performed.
When there is only an active primary segment up, and the corresponding mirror is down, the primary will go into Change Tracking Mode. In Change Tracking Mode changes to the segment are recorded, so that mirror can be resynchronized without doing a full copy of the data from the primary to the mirror.
3. How to bring up failed mirror segments ?
In order to bring up a down mirror, the gprecoverseg utility command is run. This command defaults to incremental recovery and puts the mirror into resync mode, thereby starting to recover the changes from the primary to a mirror. If a incremental recovery can not be completed, the recovery will fail and the gprecoverseg command should be run again with the "-F" option, indicating full recovery and causing the primary to copy all the data over to the mirror to get it in synch mode.
Refer the article for more information
4. Explanation for gp_segment_configuration
The modes, change tracking, resync, insync, can be seen for each segment, as well as the status up or down in the gp_segment_configuration table. Check the article for more information.
There are also columns in gp_segment_configuration called 'role' and 'preferred_role'. These can have the values of either 'p' for primary or 'm' for mirror. The 'role' shows the current role of a segment database and the 'preferred_role' shows the original role of the segment. In a balanced system the 'role' and 'preferred_role' will match for all segments. If they do not match, this indicates there may be skew in the number of active primaries on each hardware host. To rebalance the segments and bring all the segments into their preferred role, the gprecoverseg command can be run with the "r" option.
5. Why mirror being marked down ?
Additionally there is another set of events that could cause a mirror to be marked as down. As data is written from a primary segment to a mirror segment, the primary segment will detect if it is not able to send the data to its mirror pair. The data is queued up and after the 'gp_segment_connect_timeout' seconds is past, and the primary has not been able to send more data to the mirror, the primary will indicate a mirror failure and cause the mirror to be marked down and the primary to go into change tracking mode.
6. FTS related parameters
number of threads used for probing segments;
Default Setting: 16
frequency of probe loops. a loop starts evevery X seconds (seconds);
For example if the setting is 60 and probe loop takes 10 seconds, it will sleep 50 seconds,
For example if the setting is 60 and probe loop takes 75 seconds, it will sleep 0 seconds;
Default Setting: 60
probe timeout between master and segment (seconds);
Default Setting: 20
number of tries to probe segment on failure.
For example if the setting is 5 there will be 4 retries after the first attempt;
Default Setting: 5
Maximum time (in seconds) allowed for a mirror to respond;
Default Setting: 180
Sets the verbosity of logged messages pertaining to fault probing
Valid values are "off", "terse", "verbose" and "debug"
verbose setting can be used in production and provides useful data for troubleshoot, debug should not be
used in production
Default Setting: terse