Pivotal Greenplum Database (GPDB) all versions
In this article we will take a look at some common frequently asked Admin DBA related question.
1. gpstart failed what should I do?
Check gpstart logfile in ~gpadmin/gpAdminLogs/gpstart_yyyymmdd.log
Take a look at the pg startup log file for more details in $MASTER_DATA_DIRECTORY/pg_log/startup.log and also the last master log generated after the startup command was issued.
If the segments failed to startup the log ~gpadmin/gpAdminLogs/gpstart_yyyymmdd.log tells the failed segments, navigate to host/segment directory and check the logs for more details on the failure.
2. Why do we need gpstop -m and gpstart -m? and how to find the uptime of the Greenplum database?
The gpstart -m command allows you to start the master only and none of the data segments and is used primarily by support to get system level information/configuration. An end user would not regularly or even normally use it.
To find the uptime of the GPDB use the below command
select now() - pg_postmaster_start_time() ;
3. What is the procedure to get rid of mirror segments?
There are no utilities available to remove mirrors from Greenplum. You will need to reinitialize the cluster.
4. How to run gpcheckcat?
The gpcheckcat tool is used to check catalog inconsistencies between master and segments. It can be found in the $GPHOME/bin/lib directory:
Usage: gpcheckcat [dbname] -? -B parallel: number of worker threads -g dir : generate SQL to rectify catalog corruption, put it in dir -h host : DB host name -p port : DB port number -P passwd : DB password -o : check OID consistency -U uname : DB User Name -v : verbose
Example: gpcheckcat gpdb > gpcheckcat_gpdb_logfile.log
5. What is gpdetective and how do I run it in Greenplum?
The gpdetective utility collects information from a running Greenplum Database system and creates a bzip2-compressed tar output file. This output file helps with the diagnosis of Greenplum Database errors or system failures. for more details check help.
6. How to delete a standby?
To remove the currently configured standby master host from your Greenplum Database system, run the following command in the master only:
# gpinitstandby -r
7. How to re-sync a standby?
Use this option if you already have a standby master configured, and just want to resynchronize the data between the primary and backup master host. The Greenplum system catalog tables will not be updated.
# gpinitstandby -n (resynchronize)
How to recover an invalid segment?
8. Use the gprecoverseg tool, which will recognize which segments need recovery and will initialize recovery.
-- Without "-F" option - The change tracking log will be sent and applied to the mirror. -- With "-F" option - Entire data directory will be resynched.
9. How to add mirrors to the array?
The gpaddmirrors utility configures mirror segment instances for an existing Greenplum Database system that was initially configured with primary segment instances only.
For more details check help.
# gpaddmirrors --help
10. How to see primary to mirror mapping?
From database catalog following query list configuration on content ID, you can figure out primary and mirror for each content.
gpdb=# select * from gp_configuration order by content.
Note: starting from GPDB 4.x, gp_segment_configuration table is used instead.
gpdb=# select * from gp_segment_configuration order by dbid;
11. How to start/stop db in admin mode?
-- Admin mode
The gpstart with option (-R) stands for Admin mode or a restricted mode where only superusers can connect to the database when database opened using this option.
-- Utility mode
Utility mode allows you to connect to only individual segments when started using gpstart -m, for example, < to connect to only master instance only?
PGOPTIONS='-c gp_session_role=utility' psql
12. How to run gpcheckperf IO/netperf?
Create a directory where you have free space and common in all hosts. -- For network I/O test for each nic card
gpcheckperf -f seg_host_file_nic-1 -r N -d /data/gpcheckperf > seg_host_file_nic_1.out gpcheckperf -f seg_host_file_nic-2 -r N -d /data/gpcheckperf > seg_host_file_nic_2.out
-- For disk I/O
gpcheckperf -f seg_host_file_nic-1 -r ds -D -d /data/gpdb_p1 -d /data/gpdb_p2 -d /data/gpdb_m1 -d /data/gpdb_m2
13. How to update postgresql.conf and reload it?
In GP 4.0 version check gpconfig utility to change postgres.conf parameters.
14. How to manage pg_hba.conf?
The pg_hba.conf file of the master instance controls client access and authentication to your Greenplum system.
15. How to add a new user to the database?
Use "createuser" utility to create users. See "createuser --help" for more details.
You can also use SQL commands in psql prompt to create users.
CREATE USER or ROLE <ROLE_NAME> ....
16. How to create a password-free trusted env b/w the all the segment hosts?
gpssh-exkeys -h hostname1 -h hostname2 .. -h hostnameN
17. How to check db version and version at init db?
-- To check version
psql> select version();
-- To check gp version at install
psql> select * from gp_version_at_initdb;
18. How to see the value of GUC?
By connecting GPDB database using psql query catalog or do show parameter.
gpdb# select name,setting from pg_settings where name='GUC';
gpdb# show <GUC_NAME>;
19. What is the location of pg_hba/logfile/master_data_directory?
cd $MASTER_DATA_DIRECTORY - Master directory.
pg_hba.conf and postgres.conf location and other GPDB internal directories.
cd $MASTER_DATA_DIRECTORY/pg_logs - Master database logfiles location.
20. How to find errors/fatal from log files?
grep for ERRORS, FATAL, SIGSEGV in pg_logs directory.
21. What is a vacuum and when should I run this?
VACUUM reclaims storage occupied by deleted tuples. In normal GPDB operation, tuples that are deleted or obsoleted by an update are not physically removed from their table. They remain present on disk until a VACUUM is done. Therefore, it is necessary to do VACUUM periodically, especially on the frequently-updated table.
22. What is the difference between vacuum and vacuum full?
Unless you need to return space to the OS so that other tables or other parts of the system can use that space, you should use VACUUM instead of VACUUM FULL.
VACUUM FULL is only needed when you have a table that is mostly dead rows, that is, the vast majority of its contents have been deleted. Even then, there is no point using VACUUM FULL unless you urgently need that disk space back for other things or you expect that the table will never again grow to its past size. Do not use it for table optimization or periodic maintenance as it is counterproductive.
Check the guide for more information.
23. What is Analyze and how frequency should I run this?
ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the query planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy is to run VACUUM and ANALYZE once a day during a low-usage time of day.
24. What is resource queues?
Resource queues are used to manager Greenplum database workload management. All user/queries can be prioritized using Resource queues. Refer Admin guide for more details.
25. What is gp_toolkit?
The gp_toolkit is a database schema, which has many tables, views, and functions to better manage Greenplum Database when DB is up. In 3.x earlier versions, it was referred to as gp_jetpack.
For more detailed information please refer to the Greenplum Administrator's Guide.