[#97063] [Ruby master Bug#16608] ConditionVariable#wait should return false when timeout exceeded — shugo@...

Issue #16608 has been reported by shugo (Shugo Maeda).

10 messages 2020/02/05

[#97084] [Ruby master Feature#16614] New method cache mechanism for Guild — ko1@...

Issue #16614 has been reported by ko1 (Koichi Sasada).

18 messages 2020/02/07

[#97248] [Ruby master Bug#16651] Extensions Do Not Compile on Mingw64 — cfis@...

Issue #16651 has been reported by cfis (Charlie Savage).

17 messages 2020/02/24

[#97289] [Ruby master Bug#16658] `method__cache__clear` DTrace hook was dropped without replacement — v.ondruch@...

Issue #16658 has been reported by vo.x (Vit Ondruch).

9 messages 2020/02/27

[#97307] [Ruby master Feature#16663] Add block or filtered forms of Kernel#caller to allow early bail-out — headius@...

Issue #16663 has been reported by headius (Charles Nutter).

29 messages 2020/02/28

[#97310] [Ruby master Feature#16665] Add an Array#except_index method — alexandr1golubenko@...

Issue #16665 has been reported by alex_golubenko (Alex Golubenko).

12 messages 2020/02/29

[ruby-core:97049] [Ruby master Feature#16604] Set default for Encoding.default_external to UTF-8 on Windows

From: larskanis@...
Date: 2020-02-03 14:12:10 UTC
List: ruby-core #97049
Issue #16604 has been reported by larskanis (Lars Kanis).

----------------------------------------
Feature #16604: Set default for Encoding.default_external to UTF-8 on Windows
https://bugs.ruby-lang.org/issues/16604

* Author: larskanis (Lars Kanis)
* Status: Open
* Priority: Normal
----------------------------------------
This issue is related to https://bugs.ruby-lang.org/issues/13488 where we already discussed the topic an postponed the change for ruby-3. Patch is here: 

Currently `Encoding.default_external` is initialized to the local console encoding of the Windows installation unless changed per option `-E`. This is e.g. cp850 for Western Europe. It should be changed to UTF-8.

RubyInstaller provided a checkbox for `RUBYOPT=-Eutf-8` since version 2.4.
This checkbox was disabled per default, but I noticed from bug reports, that many people enabled it.
With RubyInstaller-2.7.0 this checkbox is [enabled per default](https://rubyinstaller.org/2020/01/05/rubyinstaller-2.7.0-1-released.html).
So we already have a steady migration towards UTF-8 on Windows.

Changing to UTF-8 fixes various inconsistencies within ruby and with external tools.
A very annoying case is that writing a text to file writes the file content in UTF-8, since this is the default ruby source encoding.
But reading the content back, tags the string with the wrong encoding.
But not in `irb` since it already set `Encoding.default_external = "utf-8"` on it's own.

```
s = "糜釞
File.write("x", s)   # => 6 bytes
File.read("x") == s  # => true in irb but false in .rb file
```

Another issue is that many non-asian regions have distinct legacy encodings for OEM-ANSI (aka `Encoding.find('locale')` ) and ASCII (aka `Encoding.find('filesystem')` ), so that a file written in current default external encoding `Encoding.find('locale')` is not properly interpret in Windows GUI tools like notepad. It is therefore uncommon to store files in OEM-ANSI encoding and doing so is almost certainly wrong.

RubyInstaller ships the MSYS2 environment, which defaults to UTF-8 as well.

Powershell made the switch to UTF-8 (without BOM) in [Powershell-6.0](https://docs.microsoft.com/en-us/powershell/scripting/whats-new/what-s-new-in-powershell-core-60?view=powershell-7#default-encoding-is-utf-8-without-a-bom-except-for-new-modulemanifest) and even more in 6.1.

Changing the default of `Encoding.default_external` to UTF-8 is a trade-off.
It doesn't fit to every case, but in my experience this is the best overall option.

There are some alternatives to it:

Changing the Windows console to codepage 65001:
 * The Windows implementation of 65001 is buggy in the console. I didn't verify it lately but `chcp 65001` didn't work reliable years ago.
 * It is not the default and input methods like IME are incompatible.

Setting `Encoding.default_internal` in addition:
 * This triggers transcoding of output strings, which is not enabled on other systems, causing unexpected results and incompatibilities.

Change ruby to use `Encoding.find("filesystem")` as encoding for file operations:
 * That would fix the compatibility with some builtin Windows tools, but doesn't fix encoding issues due to increased use of UTF-8.

Please note that changing `Encoding.default_external` doesn't affect file or IO output, unless `Encoding.default_internal` is set as well (which is not the default). So inspecting ruby's output with Windows builtin `more` will most likely result in garbage (since strings are usually UTF-8 in ruby) regardless of the particular `default_external` setting. On the other hand output inspected with MSYS2 `less` is most likely correct, since it expects UTF-8 input.

The patch is currently about Windows only, because I would like to focus on that question for now.
Possibly it's a subsequent question whether Encoding.default_external should default to UTF-8 on all operating systems or at least in case of `LANG=C` locale (which currently triggers US-ASCII).




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next