From: naruse@... Date: 2018-10-20T10:38:41+00:00 Subject: [ruby-core:89492] [Ruby trunk Bug#7156][Rejected] Invalid byte sequence in US-ASCII when using URI from std lib Issue #7156 has been updated by naruse (Yui NARUSE). Status changed from Feedback to Rejected The argument of URI need to be escaped. Maybe Ruby support non escaped URI when browser's URL handling becomes concrete. ---------------------------------------- Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib https://bugs.ruby-lang.org/issues/7156#change-74539 * Author: t0d0r (Todor Dragnev) * Status: Rejected * Priority: Normal * Assignee: naruse (Yui NARUSE) * Target version: * ruby -v: 1.9.3 * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- Invalid byte sequence in US-ASCII on ruby 1.9.3 I receive that error when trying to open url with bulgarian text (utf-8: "��������������"). It seems that the problem is in uri/common.rb from ruby standard library... adding str.force_encoding(Encoding::BINARY) to following method fix the problem class URI::Parser def escape(str, unsafe = @regexp[:UNSAFE]) unless unsafe.kind_of?(Regexp) # perhaps unsafe is String object unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false) end str.force_encoding(Encoding::BINARY) # FIX str.gsub(unsafe) do us = $& tmp = '' us.each_byte do |uc| tmp << sprintf('%%%02X', uc) end tmp end.force_encoding(Encoding::US_ASCII) end end One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too? ---Files-------------------------------- bulgarian.rb (61 Bytes) -- https://bugs.ruby-lang.org/ Unsubscribe: