Pivotal Knowledge Base

Follow

ERROR COPY metadata not found resulting in a mix of newline types in the data

Environment

  • Pivotal Greenplum versions: 4.3.x
  • OS: ALL

Symptom 

Greenplum Database went into recovery mode.

In the master log, there's "LOG" message like this:

2017-07-29 06:00:18.010218 ICT,"prod_analytical","prodgp",p24261,th-1052793056,"10.251.54.173","47722",2017-07-29 06:00:05 ICT,295097397,con287838,cmd1,seg-1,,dx521200,x295097397,sx1,"LOG","00000","COPY metadata not found. This probably means that there is a mixture of newline types in the data. Use the NEWLINE keyword in order to resolve this reliably.  (seg5 pgpseg801:40005 pid=11535)ERROR - Failed to complete COPY on the following:
Primary segment 11, dbid 13, with error: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

In primary log, there's "PANIC" message like this:

2017-07-29 06:00:07.365869 ICT,,,p46167,th0,,,2017-07-29 06:00:06 ICT,0,con287838,cmd1,seg31,,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1    0x32afc0f710 libpthread.so.0 <symbol not found> + 0xafc0f710
2 0x6c5dfa postgres DoCopyInternal + 0x17aa
3 0x6c71f8 postgres DoCopy + 0x98
4 0x9a4abd postgres ProcessUtility + 0x95d
5 0x9a216b postgres PortalRun + 0xa5b

Checklist

  1. "LOG" message on master gives clue that source file might have mixture of newline types
  2. Analyze core dump on segment to see if there's variable including some source data
  3. Check the newline type to see if there's conflict  

RCA

From the backtrace result, could have a general idea that it was doing COPY, and failed with ERROR after that.

(gdb) bt
#0 0x00000032afc0f5db in raise () from /data/logs/62311/packcore-core.46167/lib64/libpthread.so.0
#1 0x0000000000b01f02 in StandardHandlerForSigillSigsegvSigbus_OnMainThread (processName=<optimized out>, postgres_signal_arg=11) at elog.c:4479
#2 <signal handler called>
#3 CopyExtractRowMetaData (cstate=<optimized out>) at copy.c:6143
#4 preProcessDataLine (cstate=<optimized out>) at copy.c:6763
#5 CopyReadLineText (bytesread=<optimized out>, cstate=<optimized out>) at copy.c:4525
#6 CopyFrom (cstate=<optimized out>) at copy.c:3874
#7 0x00000000006c5dfa in DoCopyInternal (stmt=<optimized out>, queryString=<optimized out>, cstate=0x2b9dda0) at copy.c:1565
#8 0x00000000006c71f8 in DoCopy (stmt=<optimized out>, queryString=<optimized out>) at copy.c:1641
#9 0x00000000009a4abd in ProcessUtility (parsetree=0x2b56068, queryString=<optimized out>, params=0xa, isTopLevel=1 '\001', dest=0x2b56940, completionTag=0x7ffc33c09720 "")
at utility.c:1098
#10 0x00000000009a216b in PortalRunUtility (completionTag=<optimized out>, dest=<optimized out>, isTopLevel=<optimized out>, utilityStmt=<optimized out>, portal=<optimized out>)
at pquery.c:1499
#11 PortalRunMulti (completionTag=<optimized out>, altdest=<optimized out>, dest=<optimized out>, isTopLevel=<optimized out>, portal=<optimized out>) at pquery.c:1609
#12 PortalRun (portal=0x2b923c0, count=<optimized out>, isTopLevel=<optimized out>, dest=<optimized out>, altdest=<optimized out>, completionTag=<optimized out>) at pquery.c:1125
#13 0x000000000099acec in exec_mpp_query (localSlice=<optimized out>, seqServerPort=<optimized out>, seqServerHost=<optimized out>, serializedSliceInfolen=<optimized out>,
serializedSliceInfo=<optimized out>, serializedParamslen=<optimized out>, serializedParams=<optimized out>, serializedPlantreelen=<optimized out>,
serializedPlantree=<optimized out>, serializedQuerytreelen=<optimized out>, serializedQuerytree=<optimized out>, query_string=<optimized out>) at postgres.c:1359
#14 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x2a112b0 "prodgp", username=<optimized out>) at postgres.c:4914
#15 0x00000000008f93be in BackendRun (port=<optimized out>) at postmaster.c:6978
#16 BackendStartup (port=<optimized out>) at postmaster.c:6673
#17 ServerLoop () at postmaster.c:2463
#18 0x00000000008fc060 in PostmasterMain (argc=15, argv=0x29c94a0) at postmaster.c:1539
#19 0x00000000007fd7ef in main (argc=15, argv=0x29c9430) at main.c:206

Did a full backtrace to check the variables inside. Found line_start has some source data. Here's part of "bt full" result:

(gdb) bt full
#0 0x00000032afc0f5db in raise () from /data/logs/62311/packcore-core.46167/lib64/libpthread.so.0
No symbol table info available.
#1 0x0000000000b01f02 in StandardHandlerForSigillSigsegvSigbus_OnMainThread (processName=<optimized out>, postgres_signal_arg=11) at elog.c:4479
buffer = {hdr = {zero = 0, len = 72, pid = 46167, thid = 123456, main_thid = 1575909152, chunk_no = 0, is_last = 116 't', log_format = 99 'c', is_segv_msg = 116 't',
log_line_number = 0, next = -1},
data = "\366\301{Y\000\000\000\000^d\004\000\001\000\000\000\037\000\000\000\377\377\377\377\v\000\000\000\005\000\000\000\020\367\300\257\062\000\000\000\372]l\000\000\000\000\000\370ql\000\000\000\000\000\275J\232\000\000\000\000\000k!\232\000\000\000\000\000\200\323\027\357\377\377\377\377\000\231\016\357\377\377\377\377\000\000\000\000\000\000\000\000\067\070\070\070\070\066\063\065\200\351\362\356", '\377' <repeats 12 times>, "\200t\340\356\377\377\377\377\002\000\000\000\374\177\000\000@\327\270\257\062", '\000' <repeats 11 times>, "\360\331\234\002\000\000\000\000\020^\275\002\000\000\000\000\060\202\300\063\374\177\000\000p\201\300\063\374\177\000\000Pl\266\002\000\000\000\000\360\331\234\002\000\000\000\000"...}
errorData = <optimized out>
stackAddresses = <optimized out>
#2 <signal handler called>
No symbol table info available.
#3 CopyExtractRowMetaData (cstate=<optimized out>) at copy.c:6143
md_delim = 0x0
line_start = 0x2bae9cb "\267\062?\016\255.\vs\024_\ayQ\251\220\032D\251\221mw\004\255\370\266\234\254\071hK\025\321:\373G'\255*\266\245\263\037\267}\031P^\352!\002#\036v,\021\027\256\343\066\233\224\343\216=\005%\256\306\071-\210\370P/\210\311\260,\256qc\035\256\236\210\327\265!\256\242)\254\266\326\204\f\r\n"
value_len = 11
#4 preProcessDataLine (cstate=<optimized out>) at copy.c:6763
cvt = <optimized out>
force_transcoding = <optimized out>

Check line_start value, it's ended with "\f\r\n". 

(gdb) f 3
#3 CopyExtractRowMetaData (cstate=<optimized out>) at copy.c:6143
6143 copy.c: No such file or directory.
(gdb) p line_start
$1 = 0x2bae9cb "\267\062?\016\255.\vs\024_\ayQ\251\220\032D\251\221mw\004\255\370\266\234\254\071hK\025\321:\373G'\255*\266\245\263\037\267}\031P^\352!\002#\036v,\021\027\256\343\066\233\224\343\216=\005%\256\306\071-\210\370P/\210\311\260,\256qc\035\256\236\210\327\265!\256\242)\254\266\326\204\f\r\n"

Checked definition of each one of these string literals:

\f  ASCII Formfeed (FF) advance downward to the next "page". It was commonly used as page separators, but now is also used as section separators
\r ASCII Carriage Return (CR) moves the cursor to the beginning of the line without advancing to the next line
\n ASCII Linefeed (LF) moves the cursor down to the next line without returning to the beginning of the line  

Also checked online, found that combination of "\r\n" sometimes would be interpreted as "\n". Which is causing the confusion in GPDB also. 

Resolution

Use NEWLINE to define the source data:  [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']

Notes

It is recommended for users to always use NEWLINE option to indicate the newline type of their data. 

Comments

Powered by Zendesk