Pivotal Knowledge Base

Follow

How to debug Pivotal Greenplum pljava processes

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x
pljava Extension  

Purpose

Given a cause where a user-defined pljava function appears hung or is showing high CPU utilization, this article describes how to debug the postgres-spawned Java Virtual Machine (JVM) process.

Sample environment for testing

In this example, we will use the following code to generate the outputs. So if you would like to follow along on your own, here is how to get your jar file built (this is the source link):

  • git clone git@github.com:randomtask1155/plsleep.git
  • cd plsleep
  • mvn package
  • Install the resulting jar file target/plsleep-1.0.jar into the Greenplum system. For help, refer to gpdb docs
  • Create the database functions:
    -- sleeps for specified number of seconds
    CREATE FUNCTION gotosleep(integer)
            RETURNS integer
            AS 'io.pivotal.support.plsleep.GoToSleep(java.lang.Integer)'
            LANGUAGE java;
    -- spins cpu for specified number of seconds
    CREATE FUNCTION gospin(integer)
            RETURNS integer
            AS 'io.pivotal.support.plsleep.GoSpin(java.lang.Integer)'
            LANGUAGE java;
    
    -- sleeps for specificed number of seconds and then throws an exception
    CREATE FUNCTION gotosleepfail(integer)
            RETURNS integer
            AS 'io.pivotal.support.plsleep.GoToSleepFail(java.lang.Integer)'
            LANGUAGE java;

Start up a pljava process

      • connect to the database and execute query
        gpadmin=# select gotosleep(300);
      • Find the gpdb pljava process id for debuging
        gpadmin   6622  3702  1 17:16 ?        00:00:00 postgres: port  5432, gpadmin gpadmin 192.168.64.100(8767) con10 192.168.64.100(8767) cmd1 SELECT

pstack command is generally not helpful

pstack will dump all the native threads and their stacks but it will not reveal any relevant information related to the java classes being invoked. The best we can learn from this output is how many threads are running for this pid. Also, the existence of libjvm.so in the stack trace confirms that this is our pljava process:

[root@mdw ~]# pstack 6622
Thread 10 (Thread 0x7f31595e6700 (LWP 6623)):
#0  0x00007f31550ed283 in poll () from /lib64/libc.so.6
#1  0x0000000000bbf874 in rxThreadFunc ()
#2  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f313a42b700 (LWP 6628)):
#0  0x00007f3155b53a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7762f8 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b738cce in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b7391ae in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f313b8fc539 in VMThread::loop() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  0x00007f313b8fc840 in VMThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#6  0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f313a32a700 (LWP 6629)):
#0  0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b76591d in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b5d70f8 in JVM_MonitorWait () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f3131012cb8 in ?? ()
#5  0x0000000001a87db8 in ?? ()
#6  0x00007f3100000000 in ?? ()
#7  0x0000000001a87000 in ?? ()
#8  0x00007f313a329538 in ?? ()
#9  0x00000000fae02750 in ?? ()
#10 0x00007f313a3295a8 in ?? ()
#11 0x00000000faea3e90 in ?? ()
#12 0x0000000000000000 in ?? ()
Thread 7 (Thread 0x7f313a229700 (LWP 6630)):
#0  0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b76591d in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b5d70f8 in JVM_MonitorWait () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f3131012cb8 in ?? ()
#5  0x0000000001a8bed8 in ?? ()
#6  0x00007f3100000000 in ?? ()
#7  0x0000000001a8b000 in ?? ()
#8  0x00007f313a228740 in ?? ()
#9  0x00007f313a228650 in ?? ()
#10 0x0000000000000000 in ?? ()
Thread 6 (Thread 0x7f313a128700 (LWP 6631)):
#0  0x00007f3155b56b2d in accept () from /lib64/libpthread.so.0
#1  0x00007f313b25fbc1 in LinuxAttachListener::dequeue() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b25fd5b in AttachListener::dequeue() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b25e78d in attach_listener_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#6  0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f313a027700 (LWP 6632)):
#0  0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b738a1f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b739226 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f313b3aa488 in CompileQueue::get() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  0x00007f313b3ae57a in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#6  0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#8  0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#9  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f3139f26700 (LWP 6633)):
#0  0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b738a1f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b739226 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f313b3aa488 in CompileQueue::get() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  0x00007f313b3ae57a in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#6  0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#8  0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#9  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f3139e25700 (LWP 6634)):
#0  0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b738a1f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b7391ae in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f313b806888 in ServiceThread::service_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#6  0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#8  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f3139d24700 (LWP 6635)):
#0  0x00007f3155b53a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7762f8 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b738cce in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b7391ae in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f313b8ab831 in WatcherThread::sleep() const () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  0x00007f313b8abb3e in WatcherThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#6  0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f31550f6aad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f315969a720 (LWP 6622)):
#0  0x00007f3155b53a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f313b7762f8 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#2  0x00007f313b777044 in os::sleep(Thread*, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x00007f313b5df612 in JVM_Sleep () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x00007f3131012cb8 in ?? ()
#5  0x0000000001a2f5a8 in ?? ()
#6  0x00007ffddc4936c0 in ?? ()
#7  0x00007f3131010280 in ?? ()
#8  0x00007ffddc4936b8 in ?? ()
#9  0x0000000000000000 in ?? ()

Running jstack against the GPDB process

Running jstack against the Greenplum process will work because Greenplum is simply launching a JVM using the JNI standard library. So treating this process as a JVM is acceptable because it is one.

[root@mdw ~]# /usr/java/jdk1.7.0_51/bin/jstack -F 6622
Attaching to process ID 6622, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Deadlock Detection:

No deadlocks found.

Thread 6631: (state = BLOCKED)


Thread 6630: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=135 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=151 (Interpreted frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=16, line=189 (Interpreted frame)


Thread 6629: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=503 (Interpreted frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=133 (Interpreted frame)


Thread 6622: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - io.pivotal.support.plsleep.GoToSleep(java.lang.Integer) @bci=9, line=15 (Interpreted frame)

The above thread dump tells us this process is executing line 15 in function io.pivotal.support.plsleep.GoToSleep. Line 15 in the source code is Thread.sleep(dur *1000). Given that this is a simple example, it is fairly obvious that the only thing this function does is sleep, but in normal cases, it may not be obvious what this thread is doing and this thread dump can provide the desired insight.

 14     static int GoToSleep(Integer dur) throws java.lang.InterruptedException {
 15         Thread.sleep(dur * 1000);
 16         return 1;
 17     }

Running jstack on GPDB JVM core file

In this case, we just generated the core file using "gcore 6622", but a unexpected core file can be generated for lots of reasons in production. We can open this file using the following jstack command and see the same information as we would when running against a process. We know process 6622 is running function io.pivotal.support.plsleep.GoToSleep at line 15.

/usr/java/jdk1.7.0_51/bin/jstack -m /usr/local/greenplum-db/bin/postgres  core.6622
Attaching to core core.6622 from executable /usr/local/greenplum-db/bin/postgres, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Deadlock Detection:

No deadlocks found.

----------------- 6622 -----------------
0x00007f3155b53a5e	__pthread_cond_timedwait + 0x13e
0x00007f313b777044	_ZN2os5sleepEP6Threadlb + 0x254
0x00007f313b5df612	JVM_Sleep + 0x342
0x00007f3131012cb8	* java.lang.Thread.sleep(long) bci:0 (Interpreted frame)
0x00007f3131006058	* io.pivotal.support.plsleep.GoToSleep(java.lang.Integer) bci:9 line:15 (Interpreted frame)
0x00007f31310004e7	
0x00007f313b55a465	_ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x365
0x00007f313b558ec8	_ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x28
0x00007f313b5920b9	_ZL17jni_invoke_staticP7JNIEnv_P9JavaValueP8_jobject11JNICallTypeP10_jmethodIDP18JNI_ArgumentPusherP6Thread + 0x219
0x00007f313b59a36a	jni_CallStaticIntMethodA + 0xda
0x00007f313bde0c5e	JNI_callStaticIntMethodA + 0x5e
0x00007f313bdeb822	_int_invoke + 0x12
0x00007f313bddbea2	Function_invoke + 0x282
0x00007f313bdd87d0	internalCallHandler + 0xd0
0x0000000000762362	ExecMakeFunctionResult + 0x702
0x0000000000762b4d	ExecEvalFunc + 0xbd
0x000000000075fa42	ExecProject + 0x242
0x000000000078b8f4	ExecResult + 0x1c4
0x000000000075957f	ExecProcNode + 0x48f
0x000000000078fdf8	ExecSetParamPlan + 0x178
0x000000000075e9be	ExecEvalParam + 0xce
0x000000000075fa42	ExecProject + 0x242
0x000000000078b8f4	ExecResult + 0x1c4
0x000000000075957f	ExecProcNode + 0x48f
0x000000000074ea2c	ExecutePlan + 0x47c
0x000000000074f238	ExecutorRun + 0x3a8
0x00000000009a06f1	PortalRun + 0x6f1
0x000000000099628c	exec_simple_query + 0x62c
0x000000000099a1c6	PostgresMain + 0x28f6
0x00000000008f7eae	ServerLoop + 0xa4e
0x00000000008fac30	PostmasterMain + 0xff0
0x00000000007fccaf	main + 0x44f

Dump JVM heap memory stats

This is useful in cases where the JVM is stuck in garbage collection and we suspect the JVM heap memory is consistently filling up, causing performance issues. For example, the following output tells us the max heap size is 958MB:

/usr/java/jdk1.7.0_51/bin/jmap -heap 6622

Attaching to process ID 6622, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03

using thread-local object allocation.
Mark Sweep Compact GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 1004535808 (958.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 18874368 (18.0MB)
   used     = 2041944 (1.9473495483398438MB)
   free     = 16832424 (16.052650451660156MB)
   10.818608601888021% used
Eden Space:
   capacity = 16842752 (16.0625MB)
   used     = 2041944 (1.9473495483398438MB)
   free     = 14800808 (14.115150451660156MB)
   12.12357695464494% used
From Space:
   capacity = 2031616 (1.9375MB)
   used     = 0 (0.0MB)
   free     = 2031616 (1.9375MB)
   0.0% used
To Space:
   capacity = 2031616 (1.9375MB)
   used     = 0 (0.0MB)
   free     = 2031616 (1.9375MB)
   0.0% used
tenured generation:
   capacity = 41877504 (39.9375MB)
   used     = 0 (0.0MB)
   free     = 41877504 (39.9375MB)
   0.0% used
Perm Generation:
   capacity = 21757952 (20.75MB)
   used     = 4586048 (4.37359619140625MB)
   free     = 17171904 (16.37640380859375MB)
   21.077572006777107% used

Converting GPDB JVM core file to hprof

A developer may want to convert the JVM core file into a java hprof dump so that they can get more information from the JVM heap that will help them debug various issues. VisualVM is a tool that is commonly used to read these files and requires a GUI so remote debugging of hrpof files are never very useful for support but may be useful for the customer engineering team that authored the Java class in question.

/usr/java/jdk1.7.0_51/bin/jmap -dump:format=b,file=dump.hprof /usr/local/greenplum-db/bin/postgres core.6622

 

Comments

Powered by Zendesk