From: "Eregon (Benoit Daloze)" Date: 2022-01-17T21:35:40+00:00 Subject: [ruby-core:107168] [Ruby master Bug#18495] `LC_ALL=C.UTF-8` sets `Encoding.default_external` to `Encoding::US_ASCII` Issue #18495 has been updated by Eregon (Benoit Daloze). This sounds like a bug of the operating system. On Fedora 33: ``` $ env LC_ALL=C.UTF-8 locale LANG=en_US.UTF-8 LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_PAPER="C.UTF-8" LC_NAME="C.UTF-8" LC_ADDRESS="C.UTF-8" LC_TELEPHONE="C.UTF-8" LC_MEASUREMENT="C.UTF-8" LC_IDENTIFICATION="C.UTF-8" LC_ALL=C.UTF-8 $ env LC_ALL=C.UTF-8 locale charmap UTF-8 ``` On `debian:buster-slim` in Docker (podman actually): ``` # env LC_ALL=C.UTF-8 locale LANG= LANGUAGE= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_PAPER="C.UTF-8" LC_NAME="C.UTF-8" LC_ADDRESS="C.UTF-8" LC_TELEPHONE="C.UTF-8" LC_MEASUREMENT="C.UTF-8" LC_IDENTIFICATION="C.UTF-8" LC_ALL=C.UTF-8 # env LC_ALL=C.UTF-8 locale charmap UTF-8 ``` On Ubuntu 20.04: ``` # env LC_ALL=C.UTF-8 locale LANG= LANGUAGE= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_PAPER="C.UTF-8" LC_NAME="C.UTF-8" LC_ADDRESS="C.UTF-8" LC_TELEPHONE="C.UTF-8" LC_MEASUREMENT="C.UTF-8" LC_IDENTIFICATION="C.UTF-8" LC_ALL=C.UTF-8 # env LC_ALL=C.UTF-8 locale charmap UTF-8 ``` Which seems much more sensible. Maybe the C.UTF-8 "locale" is not generated on the system you tested? BTW I noticed C.UTF-8 is available in Debian & Ubuntu in Docker, but `en_US.UTF-8` is not by default, it warns with: ``` locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ``` FWIW TruffleRuby has some docs on how to properly set a `en_US.UTF-8` locale on various OS: https://github.com/oracle/truffleruby/blob/master/doc/user/utf8-locale.md (seems one of the most frequent issues when using Docker) ---------------------------------------- Bug #18495: `LC_ALL=C.UTF-8` sets `Encoding.default_external` to `Encoding::US_ASCII` https://bugs.ruby-lang.org/issues/18495#change-96028 * Author: byroot (Jean Boussier) * Status: Closed * Priority: Normal * ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin21] * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- Original bug report on Bootsnap: https://github.com/Shopify/bootsnap/issues/395#issuecomment-1014421271 ```bash $ env LC_ALL=en_US.UTF-8 ruby -e 'p Encoding.default_external' # $ env LC_ALL=C.UTF-8 ruby -e 'p Encoding.default_external' # ``` I'm not particularly familiar with `LC_ALL`, but from what I gathered online, `C.UTF-8` is supposed to mean "no internationalization, but UTF-8 support". -- https://bugs.ruby-lang.org/ Unsubscribe: