[#114348] [Ruby master Feature#19832] Method#destructive?, UnboundMethod#destructive? — "sawa (Tsuyoshi Sawada) via ruby-core" <ruby-core@...>

Issue #19832 has been reported by sawa (Tsuyoshi Sawada).

15 messages 2023/08/06

[#114365] [Ruby master Bug#19834] Segmentation fault while running in docker — "ramachandran@... (Ramachandran A) via ruby-core" <ruby-core@...>

Issue #19834 has been reported by ramachandran@mallow-tech.com (Ramachandran A).

7 messages 2023/08/09

[#114380] [Ruby master Bug#19837] Concurrent calls to Process.waitpid2 misbehave on Ruby 3.1 & 3.2 — "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@...>

Issue #19837 has been reported by kjtsanaktsidis (KJ Tsanaktsidis).

7 messages 2023/08/11

[#114399] [Ruby master Feature#19839] Need a method to check if two ranges overlap — "shouichi (Shouichi KAMIYA) via ruby-core" <ruby-core@...>

Issue #19839 has been reported by shouichi (Shouichi KAMIYA).

27 messages 2023/08/18

[#114410] [Ruby master Bug#19841] Marshal.dump stack overflow with recursive Time — "segiddins (Samuel Giddins) via ruby-core" <ruby-core@...>

Issue #19841 has been reported by segiddins (Samuel Giddins).

9 messages 2023/08/18

[#114422] [Ruby master Feature#19842] Intorduce M:N threads — "ko1 (Koichi Sasada) via ruby-core" <ruby-core@...>

Issue #19842 has been reported by ko1 (Koichi Sasada).

30 messages 2023/08/21

[#114590] [Ruby master Bug#19857] Eval coverage is reset after each `eval`. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

Issue #19857 has been reported by ioquatix (Samuel Williams).

21 messages 2023/08/30

[ruby-core:114558] [Ruby master Bug#18601] Invalid byte sequences in Big5 encodings

From: "jeremyevans0 (Jeremy Evans) via ruby-core" <ruby-core@...>
Date: 2023-08-25 17:49:38 UTC
List: ruby-core #114558
Issue #18601 has been updated by jeremyevans0 (Jeremy Evans).





@duerst ping.



----------------------------------------

Bug #18601: Invalid byte sequences in Big5 encodings

https://bugs.ruby-lang.org/issues/18601#change-104366



* Author: janosch-x (Janosch M=FCller)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin D=FCrst)

* ruby -v: any

* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN

----------------------------------------

I encoded all unicode codepoints in all encodings:



```

full_string =3D ((0..0xD7FF).to_a + (0xE000..0x10FFFF).to_a).pack('U*'); 1



uniq_encodings =3D

  Encoding.name_list -

  Encoding.aliases.keys -

  %w[locale external filesystem internal]



encoded_strings =3D=20

  uniq_encodings.map do |enc|

    full_string.encode(enc, invalid: :replace, undef: :replace, replace: '')

  rescue =3D> e

    puts e

  end; 1

```



This prints about 10 "converter not found" errors, such as `code converter =
not found (UTF-8 to UTF-7)`, but I guess this is expected.



Some of the converters seem to output invalid strings, though:



```

encoded_strings.each do |str|

  str&.codepoints

rescue =3D> e

  puts e

end; 1

```



This will print `invalid byte sequence in {Big5HKSCS,Big5-UAO,CP950,CP951}`.



Looking for example at the generated CP950 string, 8031 of its 25342 charac=
ters are invalid, spread across 2017 distinct ranges in the string. The inv=
alid characters' codepoints are all in the range of 0x81..0xFE.



Is this a bug?



I would expect `String#encode` with `invalid: :replace, undef: :replace` no=
t to create invalid byte sequences, but maybe I am misunderstanding these e=
ncodings and this is an unavoidable issue?



CC @duerst







--=20

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/

In This Thread

Prev Next