Pivotal Knowledge Base

Follow

How to deal with multi-byte Unicode characters in Greenplum Database

Environment 

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x

Symptom

Sometimes customers encounter string manipulation issue if a record in a table contains Unicode international characters (e.g. Chinese characters).

For example:

testDB=> select substr('中国',1,2);

substr
--------
?
(1 row)

 

Cause

This is because the client_encoding is not set correctly on the session level.

testDB=> show client_encoding;

client_encoding
-----------------
latin1
(1 row)

 

Resolution

Set the client_encoding type correctly. In this specific case, we set it to UTF8 which is a Unicode encoding type.

testDB=> \encoding UTF8
testDB=> select substr('中国',1,2);

substr
--------
中国
(1 row)

Note: this method only works on the session level.

Comments

Powered by Zendesk