[ruby-core:114558] [Ruby master Bug#18601] Invalid byte sequences in Big5 encodings
From:
"jeremyevans0 (Jeremy Evans) via ruby-core" <ruby-core@...>
Date:
2023-08-25 17:49:38 UTC
List:
ruby-core #114558
Issue #18601 has been updated by jeremyevans0 (Jeremy Evans).
@duerst ping.
----------------------------------------
Bug #18601: Invalid byte sequences in Big5 encodings
https://bugs.ruby-lang.org/issues/18601#change-104366
* Author: janosch-x (Janosch M=FCller)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin D=FCrst)
* ruby -v: any
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
I encoded all unicode codepoints in all encodings:
```
full_string =3D ((0..0xD7FF).to_a + (0xE000..0x10FFFF).to_a).pack('U*'); 1
uniq_encodings =3D
Encoding.name_list -
Encoding.aliases.keys -
%w[locale external filesystem internal]
encoded_strings =3D=20
uniq_encodings.map do |enc|
full_string.encode(enc, invalid: :replace, undef: :replace, replace: '')
rescue =3D> e
puts e
end; 1
```
This prints about 10 "converter not found" errors, such as `code converter =
not found (UTF-8 to UTF-7)`, but I guess this is expected.
Some of the converters seem to output invalid strings, though:
```
encoded_strings.each do |str|
str&.codepoints
rescue =3D> e
puts e
end; 1
```
This will print `invalid byte sequence in {Big5HKSCS,Big5-UAO,CP950,CP951}`.
Looking for example at the generated CP950 string, 8031 of its 25342 charac=
ters are invalid, spread across 2017 distinct ranges in the string. The inv=
alid characters' codepoints are all in the range of 0x81..0xFE.
Is this a bug?
I would expect `String#encode` with `invalid: :replace, undef: :replace` no=
t to create invalid byte sequences, but maybe I am misunderstanding these e=
ncodings and this is an unavoidable issue?
CC @duerst
--=20
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/