[ruby-core:70735] [Ruby trunk - Bug #11522] URI::decode returns incorrectly encoding strings

From: usa@...
Date: 2015-09-12 14:57:35 UTC
List: ruby-core #70735
Issue #11522 has been updated by Usaku NAKAMURA.


I agree with you, nobu.
But, it should be ASCII-8BIT, not US-ASCII.

----------------------------------------
Bug #11522: URI::decode returns incorrectly encoding strings
https://bugs.ruby-lang.org/issues/11522#change-54114

* Author: Charlie Anderson
* Status: Open
* Priority: Normal
* Assignee: akira yamada
* ruby -v: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
When given unicode characters to encode and decode, the URI module returns =
a string with an invalid encoding.

~~~
irb(main):026:0* unicode =3D '=C5=93=C2=B4=C3=A5=E2=88=91=C2=AE=C2=B4=C3=9F=
=E2=88=82=E2=80=A0=E2=89=88=C2=A9=C6=92=C3=A7=CB=99=C2=A9=E2=88=9A=E2=88=86=
=CB=99=E2=88=AB=CB=9A=E2=88=86~=C2=AC'
=3D> "=C5=93=C2=B4=C3=A5=E2=88=91=C2=AE=C2=B4=C3=9F=E2=88=82=E2=80=A0=E2=89=
=88=C2=A9=C6=92=C3=A7=CB=99=C2=A9=E2=88=9A=E2=88=86=CB=99=E2=88=AB=CB=9A=E2=
=88=86~=C2=AC"
irb(main):027:0> unicode.encoding
=3D> #<Encoding:UTF-8>
irb(main):028:0> unicode.valid_encoding?
=3D> true
irb(main):029:0> encoded =3D URI::encode(unicode)
=3D> "%C5%93%C2%B4%C3%A5%E2%88%91%C2%AE%C2%B4%C3%9F%E2%88%82%E2%80%A0%E2%89=
%88%C2%A9%C6%92%C3%A7%CB%99%C2%A9%E2%88%9A%E2%88%86%CB%99%E2%88%AB%CB%9A%E2=
%88%86~%C2%AC"
irb(main):030:0> encoded.encoding
=3D> #<Encoding:US-ASCII>
irb(main):031:0> encoded.valid_encoding?
=3D> true
irb(main):032:0> decoded =3D URI::decode(encoded)
=3D> "\xC5\x93\xC2\xB4\xC3\xA5\xE2\x88\x91\xC2\xAE\xC2\xB4\xC3\x9F\xE2\x88\=
x82\xE2\x80\xA0\xE2\x89\x88\xC2\xA9\xC6\x92\xC3\xA7\xCB\x99\xC2\xA9\xE2\x88=
\x9A\xE2\x88\x86\xCB\x99\xE2\x88\xAB\xCB\x9A\xE2\x88\x86~\xC2\xAC"
irb(main):033:0> decoded.encoding
=3D> #<Encoding:US-ASCII>
irb(main):034:0> decoded.valid_encoding?
=3D> false
~~~

I would expect decoded to have a valid encoding - probably as UTF-8?



--=20
https://bugs.ruby-lang.org/

In This Thread

Prev Next