Pivotal Greenplum Database (GPDB) 4.3.x
This article explains how Greenplum detects deadlock when resource queue is involved.
GPDB deadlock detection with resource queue
GPDB will trigger deadlock check process after deadlock_timeout while waiting for a lock to be released by another process. A wait-for graph will be built for the involved running process. When there is a cycle in the graph, GPDB claimed one deadlock happen.
Resource Queue slot lock is also considered in the deadlock detection, but the pending process might be waiting for any of the running processes. Below are some non-genuine deadlocks that might be detected.
For example, there are 4 concurrent transactions in one resource queue RQ1 with 3 concurrency limit. At some point, T1, T2, T3 are running and T4 is pending, but T4 already holds a table lock on table Table1 before it is pending on resource queue slot. When T3 runs to a statement which expects the table lock on Table1, the following wait-for graph will be created:
As you see, T3 and T4 are cycled to wait for locks. GPDB will think that there is one deadlock here. However, this deadlock is a not a genuine deadlock and can be break when either T1 or T2 finish.
Best practices for 4.3.x customers with Resource Queue deadlock issue
To avoid such scenarios involving non-genuine deadlocks - GUC deadlock_timeout setting can help. Increasing this timeout will let Greenplum check the deadlock less frequently as a result of which some non-genuine deadlock situation can be cleared. For example, the default value for this GUC is 1 second, so if T1/T2 execution time is 10 seconds, T3/T4 will be marked as deadlock when T3/T4 waits for their locks over 1 second and starts to check deadlock. Increasing the value to 1 minute will let the transactions to continue since T1/T2 released the resource queue slot for T4 before their lock waiting timeout.
In short, deadlock_timeout is not intended to reduce real deadlock in the system. However, it does help to avoid marking some scenarios involving resource queue slot lock as deadlock. You can see much less resource queue deadlock after the GUC setting is increased.
Real deadlocks and deadlock_timeout
In 4.x deadlock involving Resource Queue cannot be avoided at all since we use statement level concurrency control in Resource Queue. In some cases, real deadlock can happen with resource queue. Also, a large value for deadlock_timeout has some negative impact i.e. some real deadlocks cannot be detected and break in time.
Resource queue logic will be redesigned for 5.x that will help for the scenarios where deadlock involved with resource queue.