From: deshi xiao Date: 2012-01-27T23:48:59+09:00 Subject: [ruby-core:42245] [ruby-trunk - Bug #5831] URI.extract not properly extracting URIs with trailing slash followed by single quote Issue #5831 has been updated by deshi xiao. I have reading lib/uri/common.rb, I found the URI.extract's behavior is split url with whitespace. so i think you report is not bug. here is clue,please have a look. # Constructs the default Hash of Regexp's 500 def initialize_regexp(pattern) 501 ret = {} 502 503 # for URI::split 504 ret[:ABS_URI] = Regexp.new('\A\s*' + pattern[:X_ABS_URI] + '\s*\z', Regexp::EXTENDED) 505 ret[:REL_URI] = Regexp.new('\A\s*' + pattern[:X_REL_URI] + '\s*\z', Regexp::EXTENDED) ---------------------------------------- Bug #5831: URI.extract not properly extracting URIs with trailing slash followed by single quote https://bugs.ruby-lang.org/issues/5831 Author: Brian Cardarella Status: Open Priority: Normal Assignee: Category: lib Target version: 1.9.2 ruby -v: 1.9.2-p290 I have example failing test cases here: https://gist.github.com/1547904 Here is my use case. I am looking to extract URIs from emails. It has been recommended to use Nokogiri and that is just fine if the email is in HTML. But if the email is in plain-text Nokogiri doesn't work. IMO this is a bug with URI.extract's regexp. I have tested this against 1.8.7, 1.9.2, and 1.9.3 and it exists in all three. -- http://bugs.ruby-lang.org/