From: Sam Quigley Date: 2009-10-22T12:49:49+09:00 Subject: [ruby-core:26223] [Bug #2251] URI.parse accepts strings with invalid characters Bug #2251: URI.parse accepts strings with invalid characters http://redmine.ruby-lang.org/issues/show/2251 Author: Sam Quigley Status: Open, Priority: Normal Category: lib ruby -v: ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.0.0] The regexes used in URI::Parser's initialize_regexp use ^ and $ rather than \A and \Z: 399 # for URI::split 400 ret[:ABS_URI] = Regexp.new('^' + pattern[:X_ABS_URI] + '$', Regexp::EXTENDED) 401 ret[:REL_URI] = Regexp.new('^' + pattern[:X_REL_URI] + '$', Regexp::EXTENDED) The result is that URI.parse matches on any URI separated by newlines, rather than on its argument as a whole: irb(main):001:0> require 'uri' => true irb(main):002:0> URI.parse("blah\nhttp://www.foo.com/\nblahblah") => # I think programmers would expect URI.parse to only successfully parse strings that *are* URIs, rather than any string that *contains* a URI surrounded by a particular kind of whitespace. This issue has apparently caused at least one security vulnerability in the real world: http://schmoil.blogspot.com/2009/10/mainlining-new-lines-feel-burn.html Replacing the ^ and $ with \A and \Z should fix the issue, and is unlikely to break any existing code. The Rubyspec project does not seem to have any tests for this behavior. This behavior is present in at least versions 1.8.6, 1.8.7, and 1.9.1. -sq ---------------------------------------- http://redmine.ruby-lang.org