Pivotal Knowledge Base

Follow

How to Deal with Multi-Byte Unicode Characters in Greenplum Database

Environment

  • Pivotal Greenplum Database (GPDB) 4.3.x
  • Operating System (OS)- Red Hat Enterprise Linux (RHEL) 6.x

Purpose

Sometimes customers encounter string manipulation issue if a record in a table contains Unicode international characters (e.g. Chinese characters).

For example:

testDB=> select substr('中国',1,2);

substr
--------
?
(1 row)

Cause

This is because the client_encoding is not set correctly on the session level.

testDB=> show client_encoding;

client_encoding
-----------------
latin1
(1 row)

Procedure

Set the client_encoding type correctly. In this specific case, we set it to UTF8 which is a Unicode encoding type.

testDB=> \encoding UTF8
testDB=> select substr('中国',1,2);

substr
--------
中国
(1 row)

Note- This method only works on the session level.

Comments

Powered by Zendesk