39

It seems both the LANG and LANGUAGE environment variable are used by some programs to determine their user interface language.

What are the exact semantics of these variables and where can I read about their correct usage? The manpage for locale(1) only mentions the LC_* family of environment variables. Additionally there is also an LC_ALL variable commonly in place which isn't described there either.

aef
  • 1,442
  • 4
  • 18
  • 20

4 Answers4

39

LANG contain the setting for all categories that are not directly set by a LC_* variable.

LC_ALL is used to override every LC_* and LANG and LANGUAGE. It should not be set in a normal user environment, but can be useful when you are writing a script that depend on the precise output of an internationalized command.

LANGUAGE is used to set messages languages (as LC_MESSAGES) to a multi-valued value, e.g., setting it to fr:de:en will use French messages where they exist; if not, it will use German messages, and will fall back to English if neither German nor French messages are available.

Anthony Geoghegan
  • 3,761
  • 22
  • 41
Rémi
  • 1,486
  • 9
  • 10
  • 1
    Where can I find documentation about LANGUAGE? Is it mutually exclusive to LC_MESSAGES? – aef Feb 22 '12 at 00:44
  • Everything is in the locale(7) manpage. LC_MESSAGES changes the language messages are displayed in and what an affirmative or negative answer looks like. The GNU C-library contains the gettext(3), ngettext(3), and rpmatch(3) functions to ease the use of these information. The GNU gettext family of functions also obey the environment variable LANGUAGE (containing a colon-separated list of locales) if the category is set to a valid locale other than "C". – Rémi Feb 24 '12 at 15:08
  • 1
    @Rémi can you elaborate on why `LC_ALL` should not be used? – Édouard Lopez Feb 02 '16 at 12:21
  • 1
    Not much to say. You have more flexibility if you set LANG than if you set LC_ALL: you can set LANG to something and LC_COLLATE to some other thing. If you set LC_ALL, every other configuration are hidden. – Rémi Feb 03 '16 at 22:35
  • 3
    I don't think `LC_ALL` overrides `LANGUAGE`: 1. they have different meanings (order [e.g.: fr:de:en] vs. characteristics[e.g.: fr_FR]) – Murmel Jun 06 '18 at 15:11
  • 11
    2. The GNU getText documentation's chapter [Specifying a Priority List of Languages](https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html#The-LANGUAGE-variable) states: `gettext gives preference to LANGUAGE over LC_ALL and LANG`. Additionally, the chapter [Locale Environment Variables](https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html#Locale-Environment-Variables) states: `1. LANGUAGE 2. LC_ALL [...]` – Murmel Jun 13 '18 at 12:25
  • 3
    $LANGUAGE is not part of the C locales, but specific to GNU gettext. If set it is given precedence over anything else. I'm using it my own applications to avoid mixed languages when using gettext based libraries. – Bachsau Jul 12 '20 at 20:16
  • @Bachsau @Murmel Clarifying point: that GNU getText documentation states the LANGUAGE is given precedence over LC_ALL and LANG `for the purpose of message handling`, and then only if LANG (or LC_ALL) is set to something other than 'C'. – SpinUp __ A Davis Jul 13 '22 at 18:16
  • @SpinUp Of course, because `C` is the fallback used when no `LC_*` or `LANG` variables are set. It disables all handling of locales and encoding, using only raw binary data. Even if `LANGUAGE` is set, gettext still uses the current locale to determine encoding. If there is none, there's nothing to encode to. – Bachsau Jul 14 '22 at 19:45
  • @Bachsau The most important point which is not clear in your comment or the one above it is that LANGUAGE **only overrides LC_MESSAGES**, having no effect on the other LC* variables. – SpinUp __ A Davis Jul 14 '22 at 19:49
  • @SpinUp Regarding your `LC_MESSAGES` comment, I thought that to be self-explaining, as gettext is only engaged in handling messages. – Bachsau Jul 14 '22 at 19:50
12

Have a look at the manpage locale(7): it describes that LANG is a fallback setting, while LC_ALL overrides all separate LC_* settings.

Jaap Eldering
  • 9,425
  • 2
  • 18
  • 26
5

For reference, the locale system is GNU GetText, which has its full documentation available in the gettext-doc package (Debian/Ubuntu).

Alternatively, there is an online manual with authoritative and elaborate documentation of the LANG and LANGUAGE environment variables.

mikini
  • 139
  • 2
  • 3
  • 1
    `gettext` is a library to localize messages, but it is not the whole locale system. The `LC_*` variables are used by several different parts of the C standard library and various other libraries as well. – Bachsau Aug 07 '22 at 22:22
0

This answer attempts to directly quote relevant standards and contains no speculation or inaccurate statements. Edits and corrections welcome as long as they cite relevant standards and authoritative sources.

Environment Variable Priority

The priority of all four variables LC_ALL, LC_*, LANG, and LANGUAGE according to applicable standards:

  1. man 7 local:

If the second argument to setlocale(3) is an empty string, "", for the default locale, it is determined using the following steps:

  1. If there is a non-null environment variable LC_ALL, the value of LC_ALL is used.

  2. If an environment variable with the same name as one of the categories above exists and is non-null, its value is used for that category.

  3. If there is a non-null environment variable LANG, the value of LANG is used.

  1. Gnu Gettext Manual:

When a program looks up locale dependent values, it does this according to the following environment variables, in priority order:

  1. LANGUAGE
  2. LC_ALL
  3. LC_xxx, according to selected locale category: LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, ...
  4. LANG

So the currently accepted answer is inaccurate.

Environment Variable Format

From another part of the Gnu Gettext Manual:

A locale name usually has the form ‘ll_CC’. Here ‘ll’ is an ISO 639 two-letter language code, and ‘CC’ is an ISO 3166 two-letter country code.

Many locale names have an extended syntax ‘ll_CC.encoding’ that also specifies the character encoding.

Some locale names use ‘ll_CC@variant’ instead of ‘ll_CC’. The ‘@variant’ can denote any kind of characteristics that is not already implied by the language ll and the country CC.