Character Sets and Globalization

Data fetched from, and sent to, Oracle Database will be mapped between the database character set and the “Oracle client” character set of the Oracle Client libraries used by cx_Oracle. If data cannot be correctly mapped between client and server character sets, then it may be corrupted or queries may fail with “codec can’t decode byte”.

cx_Oracle uses Oracle’s National Language Support (NLS) to assist in globalizing applications. As well as character set support, there are many other features that will be useful in applications. See the Database Globalization Support Guide.

Setting the Client Character Set

In cx_Oracle 8 the default encoding used for all character data changed to “UTF-8”. This universal encoding is suitable for most applications. If you have a special need, you can pass the encoding and nencoding parameters to the cx_Oracle.connect() and cx_Oracle.SessionPool() methods to specify different Oracle Client character sets. For example:

import cx_Oracle
connection = cx_Oracle.connect(connectString, encoding="US-ASCII",
        nencoding="UTF-8")

The encoding parameter affects character data such as VARCHAR2 and CLOB columns. The nencoding parameter affects “National Character” data such as NVARCHAR2 and NCLOB. If you are not using national character types, then you can omit nencoding. Both the encoding and nencoding parameters are expected to be one of the Python standard encodings such as UTF-8. Do not accidentally use UTF8, which Oracle uses to specify the older Unicode 3.0 Universal character set, CESU-8. Note that Oracle does not recognize all of the encodings that Python recognizes. You can see which encodings are usable in cx_Oracle by issuing this query:

select distinct utl_i18n.map_charset(value)
from v$nls_valid_values
where parameter = 'CHARACTERSET'
  and utl_i18n.map_charset(value) is not null
order by 1

Note

From cx_Oracle 8, it is no longer possible to change the character set using the NLS_LANG environment variable. The character set component of that variable is ignored. The language and territory components of NLS_LANG are still respected by the Oracle Client libraries.

Character Set Example

The script below tries to display data containing a Euro symbol from the database.

connection = cx_Oracle.connect(userName, password, "dbhost.example.com/orclpdb1",
        encoding="US-ASCII")
cursor = connection.cursor()
for row in cursor.execute("select nvarchar2_column from nchar_test"):
    print(row)

Because the ‘€’ symbol is not supported by the US-ASCII character set, all ‘€’ characters are replaced by ‘¿’ in the cx_Oracle output:

('¿',)

When the encoding parameter is removed (or set to “UTF-8”) during connection:

connection = cx_Oracle.connect(userName, password, "dbhost.example.com/orclpdb1")

Then the output displays the Euro symbol as desired:

('€',)

Finding the Database and Client Character Set

To find the database character set, execute the query:

SELECT value AS db_charset
FROM nls_database_parameters
WHERE parameter = 'NLS_CHARACTERSET';

To find the database ‘national character set’ used for NCHAR and related types, execute the query:

SELECT value AS db_ncharset
FROM nls_database_parameters
WHERE parameter = 'NLS_NCHAR_CHARACTERSET';

To find the current “client” character set used by cx_Oracle, execute the query:

SELECT DISTINCT client_charset AS client_charset
FROM v$session_connect_info
WHERE sid = SYS_CONTEXT('USERENV', 'SID');

If these character sets do not match, characters transferred over Oracle Net will be mapped from one character set to another. This may impact performance and may result in invalid data.

Setting the Oracle Client Locale

You can use the NLS_LANG environment variable to set the language and territory used by the Oracle Client libraries. For example, on Linux you could set:

export NLS_LANG=JAPANESE_JAPAN

The language (“JAPANESE” in this example) specifies conventions such as the language used for Oracle Database messages, sorting, day names, and month names. The territory (“JAPAN”) specifies conventions such as the default date, monetary, and numeric formats. If the language is not specified, then the value defaults to AMERICAN. If the territory is not specified, then the value is derived from the language value. See Choosing a Locale with the NLS_LANG Environment Variable

If the NLS_LANG environment variable is set in the application with os.environ['NLS_LANG'], it must be set before any connection pool is created, or before any standalone connections are created.

Other Oracle globalization variables, such as NLS_DATE_FORMAT can also be set to change the behavior of cx_Oracle, see Setting NLS Parameters.