[#107867] Fwd: [ruby-cvs:91197] 8f59482f5d (master): add some tests for Unicode Version 14.0.0 — Martin J. Dürst <duerst@...>
To everybody taking care of continuous integration:
3 messages
2022/03/13
[#108090] [Ruby master Bug#18666] No rule to make target 'yaml/yaml.h', needed by 'api.o' — duerst <noreply@...>
Issue #18666 has been reported by duerst (Martin D端rst).
7 messages
2022/03/28
[#108117] [Ruby master Feature#18668] Merge `io-nonblock` gems into core — "Eregon (Benoit Daloze)" <noreply@...>
Issue #18668 has been reported by Eregon (Benoit Daloze).
22 messages
2022/03/30
[ruby-core:107959] [Ruby master Bug#18641] UTF-16 surrogate pairs
From:
"noraj (Alexandre ZANNI)" <noreply@...>
Date:
2022-03-17 18:55:46 UTC
List:
ruby-core #107959
Issue #18641 has been reported by noraj (Alexandre ZANNI).
----------------------------------------
Bug #18641: UTF-16 surrogate pairs
https://bugs.ruby-lang.org/issues/18641
* Author: noraj (Alexandre ZANNI)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
That Ruby triggers an *invalid Unicode codepoint* error while using surrogate pairs in an UTF-8 string is expected, however those codepoints should be valid in an UTF-16 string.
It is also expected that unpaired surrogates are invalid however paired surrogates are valid cf. https://unicode.org/faq/utf_bom.html#utf16-7.
Version tested: 3.0.3p157 and 3.1.0p0
``` ruby
➜ irb
irb(main):001:0> a = ''.force_encoding(Encoding::UTF_16)
=> ""
irb(main):002:0> a += "\uD83D\uDC69".force_encoding(Encoding::UTF_16)
/home/noraj/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/irb/workspace.rb:119:in `eval': (irb):2: invalid Unicode codepoint (SyntaxError)
a += "\uD83D\uDC69".force_encoding(Encodi...
^~~~
(irb):2: invalid Unicode codepoint
a += "\uD83D\uDC69".force_encoding(Encoding::UT...
^~~~
from /home/noraj/.asdf/installs/ruby/3.1.0/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
from /home/noraj/.asdf/installs/ruby/3.1.0/bin/irb:25:in `load'
from /home/noraj/.asdf/installs/ruby/3.1.0/bin/irb:25:in `<main>'
```
Also see [Unicode 14.0 Implementation Guidelines - 5.4 Handling Surrogate Pairs in UTF-16](https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf)
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>