Pivotal Knowledge Base

Follow

Special Characters are removed when loading Greenplum with Informatica

Environment

Product Version
Pivotal Greenplum (GPDB) 4.3.x
OS RHEL 6.x
Others Informatica 

Symptom

When loading a table in Greenplum from Informatica, the source data will have special characters but the Greenplum destination will not have them.

  • For example ö is replaced by o or é is replaced by e.
  • This affects diacritics such as umlaut, accents, diaeresis, etc.

Cause 

Informatica provides several features to remove diacritics from some of their translation tools.  Some of these features are RemoveDiacritics and ASCII_SIMPLIFIED settings which will remove these special characters.

https://kb.informatica.com/whitepapers/4/Pages/1/367890.aspx
"For instance, “ö” is replaced by “o” or “é” is replaced by “e”. SIMPLIFIED never replaces one character with more than one character. It also does not remove any character. That is, the string length always stays the same."

RCA 

Greenplum can easily handle these special characters.  By default Greenplum uses UTF8 which is a multi-byte encoding able to handle most languages.

Typically, if translating the text is not possible, we would see a ? character or a character and <code> which is expected behaviour.

Setting client_encoding can switch between different encoding.  See http://gpdb.docs.pivotal.io/4360/guc_config-client_encoding.html for more details.

Latin1 data inserted
gpadmin=# SET client_encoding='LATIN1';
SET
gpadmin=# INSERT INTO encoding_test values ('Ö_latin');
INSERT 1
gpadmin=# SELECT * from encoding_test;
              a
-----------------------------
 Ö_latin
(1 row)

gpadmin=# SET client_encoding='UTF8';
SET
gpadmin=# SELECT * from encoding_test;
                a
----------------------------------
 Ã\u0096_latin
(1 row)
UTF8/Unicode data inserted
gpadmin=# INSERT INTO encoding_test values ('Ö_utf8');
INSERT 1
gpadmin=# SELECT * from encoding_test;
                a
----------------------------------
 Ã\u0096_latin
 Ö_utf8
(2 rows)

gpadmin=# SET client_encoding='LATIN1';
SET
gpadmin=# SELECT * from encoding_test;
              a
-----------------------------
 Ö_latin
 �_utf8
(2 rows)

 

Resolution

Turn off Informatica RemoveDiacritics settings to maintain the special characters when loaded into Greenplum. 

Comments

Powered by Zendesk