[#107765] [Ruby master Bug#18605] Fails to run on (newer) 32bit Windows with ucrt — "lazka (Christoph Reiter)" <noreply@...>

Issue #18605 has been reported by lazka (Christoph Reiter).

8 messages 2022/03/03

[#107769] [Ruby master Misc#18609] keyword decomposition in enumerable (question/guidance) — "Ethan (Ethan -)" <noreply@...>

Issue #18609 has been reported by Ethan (Ethan -).

10 messages 2022/03/04

[#107784] [Ruby master Feature#18611] Promote best practice for combining multiple values into a hash code — "chrisseaton (Chris Seaton)" <noreply@...>

Issue #18611 has been reported by chrisseaton (Chris Seaton).

12 messages 2022/03/07

[#107791] [Ruby master Bug#18614] Error (busy loop) inTestGemCommandsSetupCommand#test_destdir_flag_does_not_try_to_write_to_the_default_gem_home — duerst <noreply@...>

Issue #18614 has been reported by duerst (Martin D端rst).

7 messages 2022/03/08

[#107794] [Ruby master Feature#18615] Use -Werror=implicit-function-declaration by deault for building C extensions — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18615 has been reported by Eregon (Benoit Daloze).

11 messages 2022/03/08

[#107832] [Ruby master Bug#18622] const_get still looks in Object, while lexical constant lookup no longer does — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18622 has been reported by Eregon (Benoit Daloze).

16 messages 2022/03/10

[#107847] [Ruby master Bug#18625] ruby2_keywords does not unmark the hash if the receiving method has a *rest parameter — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18625 has been reported by Eregon (Benoit Daloze).

13 messages 2022/03/11

[#107886] [Ruby master Feature#18630] Introduce general `IO#timeout` and `IO#timeout=`for all (non-)blocking operations. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18630 has been reported by ioquatix (Samuel Williams).

28 messages 2022/03/14

[#108026] [Ruby master Feature#18654] Enhancements to prettyprint — "kddeisz (Kevin Newton)" <noreply@...>

Issue #18654 has been reported by kddeisz (Kevin Newton).

9 messages 2022/03/22

[#108039] [Ruby master Feature#18655] Merge `IO#wait_readable` and `IO#wait_writable` into core — "byroot (Jean Boussier)" <noreply@...>

Issue #18655 has been reported by byroot (Jean Boussier).

10 messages 2022/03/23

[#108056] [Ruby master Bug#18658] Need openssl 3 support for Ubuntu 22.04 (Ruby 2.7.x and 3.0.x) — "schneems (Richard Schneeman)" <noreply@...>

Issue #18658 has been reported by schneems (Richard Schneeman).

19 messages 2022/03/24

[#108075] [Ruby master Bug#18663] Autoload doesn't work with fiber context switch. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18663 has been reported by ioquatix (Samuel Williams).

10 messages 2022/03/25

[#108117] [Ruby master Feature#18668] Merge `io-nonblock` gems into core — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18668 has been reported by Eregon (Benoit Daloze).

22 messages 2022/03/30

[ruby-core:107989] [Ruby master Bug#18641] UTF-16 surrogate pairs

From: "noraj (Alexandre ZANNI)" <noreply@...>
Date: 2022-03-19 13:22:29 UTC
List: ruby-core #107989
Issue #18641 has been updated by noraj (Alexandre ZANNI).


Thank you Martin.

I'm actually working on an Unicode study, I was not interested into representing the emoji with it's codepoint but to actually be able to write non-BMP glyph in UTF-16 by using the surrogates.

As far as I understand, it's not possible to have a native UTF-16 string it will always be UTF-8 converted to UTF-16 so my only option to write surrogates directly is to use pack?

----------------------------------------
Bug #18641: UTF-16 surrogate pairs
https://bugs.ruby-lang.org/issues/18641#change-96942

* Author: noraj (Alexandre ZANNI)
* Status: Rejected
* Priority: Normal
* ruby -v: ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
That Ruby triggers an *invalid Unicode codepoint* error while using surrogate pairs in an UTF-8 string is expected, however those codepoints should be valid in an UTF-16 string.
It is also expected that unpaired surrogates are invalid however paired surrogates are valid cf. https://unicode.org/faq/utf_bom.html#utf16-7.

Version tested: 3.0.3p157, 3.1.0p0 and 3.1.1p18

``` ruby
➜ irb
irb(main):001:0> a = ''.force_encoding(Encoding::UTF_16)
=> ""
irb(main):002:0> a += "\uD83D\uDC69".force_encoding(Encoding::UTF_16)
/home/noraj/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/irb/workspace.rb:119:in `eval': (irb):2: invalid Unicode codepoint (SyntaxError)                                                                            
a += "\uD83D\uDC69".force_encoding(Encodi...                                                
        ^~~~                                                                                
(irb):2: invalid Unicode codepoint                                                          
a += "\uD83D\uDC69".force_encoding(Encoding::UT...                                          
              ^~~~                                                                          
        from /home/noraj/.asdf/installs/ruby/3.1.0/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'                                                                                           
        from /home/noraj/.asdf/installs/ruby/3.1.0/bin/irb:25:in `load'                     
        from /home/noraj/.asdf/installs/ruby/3.1.0/bin/irb:25:in `<main>'
```

Also see [Unicode 14.0 Implementation Guidelines - 5.4 Handling Surrogate Pairs in UTF-16](https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf)



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread