[#61822] Plan Developers Meeting Japan April 2014 — Zachary Scott <e@...>

I would like to request developers meeting around April 17 or 18 in this mo=

14 messages 2014/04/03
[#61825] Re: Plan Developers Meeting Japan April 2014 — Urabe Shyouhei <shyouhei@...> 2014/04/03

It's good if we have a meeting then.

[#61826] Re: Plan Developers Meeting Japan April 2014 — Zachary Scott <e@...> 2014/04/03

Regarding openssl issues, I=E2=80=99ve discussed possible meeting time with=

[#61833] Re: Plan Developers Meeting Japan April 2014 — Martin Bo煬et <martin.bosslet@...> 2014/04/03

Hi,

[ruby-core:61907] [ruby-trunk - Bug #9715] [Open] ENV data yield ASCII-8BIT encoded strings under Windows with unicode username

From: thomas@...
Date: 2014-04-08 12:08:51 UTC
List: ruby-core #61907
Issue #9715 has been reported by Thomas Thomassen.

----------------------------------------
Bug #9715: ENV data yield ASCII-8BIT encoded strings under Windows with uni=
code username
https://bugs.ruby-lang.org/issues/9715

* Author: Thomas Thomassen
* Status: Open
* Priority: Normal
* Assignee: cruby-windows
* Category: platform/windows
* Target version: current: 2.2.0
* ruby -v: ruby 2.2.0dev (2014-04-07 trunk 45530) [i386-mswin32_100]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
My testing scenario:
English Windows, Unicode username: =E3=81=A6=E3=81=99=E3=81=A8

Home directory: C:\Users\=E3=81=A6=E3=81=99=E3=81=A8\

The values returned from ENV have different encoding depending on their con=
tent. It appear to be OEM encoding label to most value, except when they co=
ntain characters not included in the OEM codepage. When they are not, for i=
nstance `ENV['HOME']` when the username is "=E3=81=A6=E3=81=99=E3=81=A8" wi=
ll have ASCII-8BIT.

(I find the "ASCII-8BIT" name for an encoding confusing, as ASCII is 7bit -=
 byte range 0-127)
But it appear that "ASCII-8BIT" is also aliased as "binary"? So Ruby is her=
e returning a binary string when ENV contain byte characters not included i=
n the OEM code page?

Reading the docs for Encoding:

> Returns default internal encoding.  Strings will be transcoded to the def=
ault internal encoding in the following places if the default internal enco=
ding is not nil:
> ...
> ::default_internal is initialized by the source file's internal_encoding =
or -E option.

This includes `ENV` - but, even when I run ruby with the `-E` flag the `ENV=
` encoding doesn't change. It's still using the OEM code page - or ASCII-8B=
IT.
However, regardless of having set `-E` or not, ENV do appear to return UTF-=
8 bytes in the strings that contain the Unicode username.

This is one of several areas where I have found -E to have no effect on Rub=
y's string handling. I understand that some of Ruby's file handling is for =
backwards compatibility reasons, but I'm finding it difficult to set up a s=
ystem which can properly handle Unicode files under Windows. Is this delibe=
rate due to backwards compatibility decisions? Or have I simply not found t=
he correct configuration flags for it? To me it appear bugged - inconsisten=
t with what the documentation says. But please enlighten me if I am incorre=
ct. My ideal situation would be for all strings to default to UTF-8.


Examples:

~~~
C:\ruby-220\usr\bin>ruby -E UTF-8:UTF-8 -e "p ENV['ProgramFiles'].encoding"
#<Encoding:CP850>

C:\ruby-220\usr\bin>ruby -E UTF-8:UTF-8 -e "p ENV['ProgramFiles'].bytes"
[67, 58, 92, 80, 114, 111, 103, 114, 97, 109, 32, 70, 105, 108, 101, 115, 3=
2, 40, 120, 56, 54, 41]
~~~

~~~
C:\ruby-220\usr\bin>ruby -e "p ENV['HOME']"
"C:/Users/\xE3\x81\xA6\xE3\x81\x99\xE3\x81\xA8"

C:\ruby-220\usr\bin>ruby -e "p ENV['HOME'].encoding"
#<Encoding:ASCII-8BIT>

C:\ruby-220\usr\bin>ruby -e "p ENV['HOME'].bytes"
[67, 58, 47, 85, 115, 101, 114, 115, 47, 227, 129, 166, 227, 129, 153, 227,=
 129, 168]

C:\ruby-220\usr\bin>ruby -e "p __ENCODING__"
#<Encoding:CP850>

C:\ruby-220\usr\bin>ruby -e "p Encoding.default_internal"
nil

C:\ruby-220\usr\bin>ruby -e "p Encoding.default_external"
#<Encoding:CP850>

C:\ruby-220\usr\bin>ruby -e "p Encoding.find('filesystem')"
#<Encoding:Windows-1252>

C:\ruby-220\usr\bin>ruby -E UTF-8:UTF-8 -e "p ENV['HOME'].encoding"
#<Encoding:ASCII-8BIT>

C:\ruby-220\usr\bin>ruby -E UTF-8:UTF-8 -e "p ENV['HOME'].bytes"
[67, 58, 47, 85, 115, 101, 114, 115, 47, 227, 129, 166, 227, 129, 153, 227,=
 129, 168]
~~~



--=20
https://bugs.ruby-lang.org/

In This Thread

Prev Next