Pivotal Knowledge Base

Follow

Greenplum Queries Using string_to_array Function "Error: String size not supported. (varlena.c:2757)"

Environment

 Product  Version
 Pivotal Greenplum  4.3.x

Symptom

On querying a column with a variable length data type such as varchar or text, Greenplum returns the following error: 

ERROR: String size not supported. (varlena.c:2757) (seg13 slice1 sdw3:60001 pid=7349) (cdbdisp.c:1320)

Cause

What is a string size error?

The maximum length of any string allowed in postgres is 1 GB. When a string exceeds this size, we encounter the "String size not supported error" error. 

When does this issue occur?

The below example details the scenario in which this issue can occur: 

In the below example, 'GPDBGPDBGP' is a 10 character string (1 char = 1 byte).

If we create an aggregated string of 10 characters repeated 10 times i.e. 100 bytes and a string_to_array function succeeds, this error can be seen:

gpadmin=# create table virtu as select string_agg('GPDBGPDBGP') doc from generate_series(1,10);
SELECT 1

gpadmin=# select * from virtu;
doc
------------------------------------------------------------------------------------------------------
GPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGP
(1 row)

gpadmin=# select pg_size_pretty(length(doc)) from virtu;
pg_size_pretty
----------------
100 bytes (1 row)

gpadmin=# select string_to_array(doc,'|') from virtu;
string_to_array
--------------------------------------------------------------------------------------------------------
{GPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGPGPDBGPDBGP}
(1 row)

Scaling to a greater string size:

gpadmin=# drop table virtu;
DROP TABLE

Creating a 250 MB string:

gpadmin=# create table virtu as select string_agg('GPDBGPDBGP') doc from generate_series(1,1024*1024*25);
SELECT 1
gpadmin=# select pg_size_pretty(length(doc)) from virtu;
pg_size_pretty 
----------------
250 MB
(1 row)

Executing string_to_array function on a 250 MB string:

psql -c "select string_to_array(doc,'|') from virtu" > string_to_array_250MB.out

[gpadmin@mdw ~]$ ls -lrth | tail -1
-rw------- 1 gpadmin gpadmin 751M Feb 1 19:43 string_to_array_250MB.out

The output is approximately three times 751 MB, to verify that it actually wrote the string use:

grep GP string_to_array_250MB.out | less

What is the limit?

The maximum allowed size is 250 MB as Greenplum uses a wide character data type which consumes four times the size:

gpadmin=# drop table virtu;
DROP TABLE 
gpadmin=# create table virtu as select string_agg('GPDBGPDBGP') doc from generate_series(1,1024*1024*26); 
SELECT 1 
gpadmin=# select pg_size_pretty(length(doc)) from virtu; 
pg_size_pretty
---------------- 
260 MB
(1 row) 
gpadmin=# select string_to_array(doc,'|') from virtu; 
ERROR: String size not supported. (varlena.c:2757) (seg13 slice1 sdw3:60001 pid=7349) (cdbdisp.c:1320)

The above example explains that if the size of data before delimiter exceeds 250 MB, the string size will not be supported. This is a limitation with Greenplum.

In case there are multiple rows in the table, the query below can be used to find the size of individual rows and to check if one particular row is exceeding in size:

select pg_size_pretty(length(doc)) from virtu;

Resolution

In this scenario, there are two options:

  • Limit the size of data being passed to the string_to_array function.
  • Use a proper delimiter so that the size of delimited data does not exceed 250 MB.

 

 

Comments

Powered by Zendesk