Pivotal Knowledge Base

Follow

Libchurl remote component error 413 when querying external_table in Pivotal HDB

Environment

  • PHD 1.1.0 and PHD 1.1.1.0 and PHD 2.0.1.0B - Configuration changes as suggested in this article
  • PHD 1.0.1 and PHD 2.0.1    - No workaround
  • PXFD ( Isilon )

Symptom

gpadmin=# select * from table_column_1000;                                      
ERROR:  remote component error (413) from '192.168.2.104:50075': HTTP status code is 413 but HTTP response string is empty (libchurl.c:717)  (seg2 slice1 hdw2.hadoop.local:40000 pid=2638) (cdbdisp.c:1571)
DETAIL:  External table table_column_1000, file pxf://hdm1:50070/tmp/data.txt?Fragmenter=HdfsDataFragmenter&Accessor=TextFileAccessor&Resolver=TextResolver&segwork=34@192.168.2.104@50075@tmp/data.txt@0

Cause

This is a limitation imposed by jetty server where the HTTP response header is hard coded to 64kb in PHD 1.0.1 and 64kb is the default setting for PHD 1.1.0 and later.  When PXF query attempts to read a table with a large number of columns, the http response header will be larger then 64kb and jetty server will return HTTP status code 413.  

Fix

The fix will be to increase the http.header.size for the namenode http jetty server to a value higher then the default of 64.  

PXF will send a http request json request to the namenode.  The payload will include all of the column names and data types.  The data will look as followed.

X-GP-ATTR-NAME0: productid_col1
X-GP-ATTR-TYPECODE0: 23
X-GP-ATTR-TYPENAME0: int4

The "X-GP-ATTR" prefixes in the the three variables sent in the json packet will always be the same.  However the TYPENAMEx and NAMEx values will vary depending on the tables column name and data type and how many columns are in the table.  We can safely estimate PXF will need about 80 bytes for the meta data including TYPECODEx and TYPENAMEx.  So when determining what the value should be for the http header size you need to include 80 bytes plus the number bytes in the column name times the number of columns.  Here is an example to help make sense of this.

Assume you have 1000 columns with names like col1, col2, col3, col4...

  • 1000 columns with names like col1, col2, col3, col4...
  • The largest column name is col1000 = 7 characters

We can then calculate the size of the json payload given the following estimates plus size of largest column

  • CHARBYTES = 2 Bytes 
  • HTTPHEADER = 200 Bytes
  • LARGESTCOLUMN = 80 Bytes + ( 7 characters * CHARBYTES ) = 94 Bytes
  • NUMBEROFCOLUMNS = 1000

Equation

  • Total request size = $HTTPHEADER + ( $LARGESTCOLUMN * $NUMBEROFCOLUMNS )
  • 94200 Bytes = 200 + ( 94 * 100 )

 

Suggested http.header.size is 128kb ( 131072 ) so the namenode jetty server can accept the request of 90kb

  1. Edit /etc/gphd/hadoop/conf/core-site.xml
    <property>
    <name>http.header.size</name>
    <value>131072</value>
    </property>
  2. Stop Pivotal HDB "gpstop -af"
  3. Restart Pivotal HD services using "PCC GUI or CLI icm_client
  4. Start HDB "gpstart -a"

In the case of Isilon

If you are using Isilon then you may have the pivotal pxfd daemon running on the master servers. The procedure to change the header size for pxfd is as followed

  1. vi /etc/jetty.xml
  2. update the SelectChannelConnector:
    <Set name="requestHeaderSize">131072</Set>
    <Set name="responseHeaderSize">131072</Set>
  3. restart pxfd service on all nodes running this daemon

Miscellaneous

Comments

Powered by Zendesk