[#6864] ruby 1.8.4 rc breaks alias_method/rails in bad ways — "Ara.T.Howard" <ara.t.howard@...>

20 messages 2005/12/09
[#6870] Re: ruby 1.8.4 rc breaks alias_method/rails in bad ways — =?ISO-8859-15?Q?Florian_Gro=DF?= <florgro@...> 2005/12/12

Ara.T.Howard wrote:

[#6872] Re: ruby 1.8.4 rc breaks alias_method/rails in bad ways — ara.t.howard@... 2005/12/12

On Tue, 13 Dec 2005, [ISO-8859-15] Florian Growrote:

[#6873] Re: ruby 1.8.4 rc breaks alias_method/rails in bad ways — James Edward Gray II <james@...> 2005/12/12

On Dec 12, 2005, at 1:19 PM, ara.t.howard@noaa.gov wrote:

[#6874] Re: ruby 1.8.4 rc breaks alias_method/rails in bad ways — ara.t.howard@... 2005/12/12

On Tue, 13 Dec 2005, James Edward Gray II wrote:

[#6891] Time.utc! and Time.localtime! — Daniel Hobe <hobe@...>

Writing a script yesterday I found out, much to my surprise, that the

16 messages 2005/12/14

[#6918] change to yaml in 1.8.4 — ara.t.howard@...

14 messages 2005/12/16

[#6934] 1.8.x, YAML, and release management — Ryan Davis <ryand-ruby@...>

I'm concerned that 1.8.3's acceptance of non-backwards-compatible

28 messages 2005/12/18

[#6996] Problems building 1.8.4 with VS8 C++ Express Edition (cl 14.00) — Austin Ziegler <halostatue@...>

Visual Studio C++ 2005 Express Edition (VS 8.0)

20 messages 2005/12/27

Re: bug in mailread.rb, and: proposal for Mail#to_s

From: Wybo Dekker <wybo@...>
Date: 2005-12-05 23:16:08 UTC
List: ruby-core #6844
On Tue, 6 Dec 2005, Yukihiro Matsumoto wrote:

> In message "Re: bug in mailread.rb, and: proposal for Mail#to_s"
>     on Sun, 4 Dec 2005 21:46:44 +0900, Wybo Dekker <wybo@servalys.nl> writes:
> 
> |mailread separates mail messages looking for /^From /.
> |This is incorrect, because message bodies may contain lines beginning with
> |
> |From like this mail does. So the regexp should be, I think, something 
> |like: /^From .*? \w{3} \w{3} [\d ]{2} \d\d:\d\d:\d\d \d{4}/
> 
> Interesting.  But I have seen wide range of variety of From line format.
> Does this good enough for all of them?

mbox(5) says: A postmark line consists of the four characters "From", 
followed by a space character, followed by the message's envelope sender 
address, followed by whitespace, and followed by a time stamp.  The sender 
address is expected to be an addrspec as defined in appendix D of RFC 822.

In the sources of pine (ftp://ftp.cac.washington.edu/pine/pine-4.64-1.src.rpm)
a FAQ (attached) addresses this problem, especially 
the time stamp format, which should be ctime's format. The above re 
matches that.
 
> |I also propose a to_s method, which converts the Mail object to a 
> |(possibly edited) copy of the original mail message. 
> 
> I think to_s is not sufficient for string representation of whole mail
> body.  It's just too long.  I agreed to add a new method to do this
> work.  Any name suggestion?

Mail#assemble   ?
     reassemble ?
     rebuild    ?

I have attached a new version which 
- has assemble instead of to_s
- retains the original /^From / line instead of generating a new one
- uses line.chomp instead of line.chop
- and $/ instead of "\n"
- has more comment (mostly from the pic axe)

-- 
Wybo

Attachments (2)

faq (2.13 KB, text/plain)
From pine4.64/imap/docs/FAQ.html#6.12 in the linux sources of pine
(ftp://ftp.cac.washington.edu/pine/pine-4.64-1.src.rpm):

6.12 Why are you so fussy about the date/time format in the internal
  "From " line in traditional UNIX mailbox files? My other mail program
  just considers every line that starts with "From " to be the start of
  the message.

You just answered your own question. If any line that starts with "From
" is treated as the start of a message, then every message text line
which starts with "From " has to be quoted (typically by prefixing a
">" character). People complain about this -- "why did a > get stuck in
my message?"

So, good mail reading software only considers a line to be a "From "
line if it follows the actual specification for a "From " line. This
means, among other things, that the day of week is fixed-format: "May
14", but "May 7" (note the extra space) as opposed to "May 7". ctime()
format for the date is the most common, although POSIX also allows a
numeric timezone after the year. For compatibility with ancient
software, the seconds are optional, the timezone may appear before the
year, the old 3-letter timezones are also permitted, and "remote from
xxx" may appear after the whole thing.

Unfortunately, some software written by novices use other formats. The
most common error is to have a variable-width day of month, perhaps in
the erroneous belief that RFC 2822 (or RFC 822) defines the format of
the date/time in the "From " line (it doesn't; no RFC describes internal
formats). I've seen a few other goofs, such as a single-digit second,
but these are less common.

If you are writing your own software that writes mailbox files, and you
really aren't all that savvy with all the ins and outs and ancient
history, you should seriously consider using the c-client library (e.g.
routine mail_append()) instead of doing the file writes yourself. If you
must do it yourself, use ctime(), as in:

 fprintf (mbx,"From %s@%h %s",user,host,ctime (time (0)));

rather than try to figure out a good format yourself. ctime() is the
most traditional format and nobody will flame you for using it.
mailread.rb (2.6 KB, text/x-ruby)
#
# mailread.rb - basic parsing for mbox e-mail message files
#
# Class +Mail+ provides basic parsing for mbox e-mail messages. 
# It can read an individual message from a named file, or it can be
# called repeatedly to read messages from a stream on an opened mbox
# format file. Each +Mail+ object represents a single e-mail message,
# which is split into a header and a body. The body is an array of
# lines, and the header is a hash indexed by header field name. +Mail+
# correctly joins multiline headers.

class Mail
  @@from = ''

  # read a new mail message from an mbox mail file

  def initialize(mbox)
    unless defined? mbox.gets
      mbox = open(mbox, "r")
      opened = true
    end

    @header = {}
    @body = []
    @from = @@from # From-line stored from previous Mail#new call
    begin
      while line = mbox.gets()
	line.chomp!
	if /^From /=~line	# save From-line
          @from = line
          next
        end
	break if /^$/=~line	# end of header

	if /^(\S+?):\s*(.*)/=~line
	  (attr = $1).capitalize!
	  @header[attr] = $2
	elsif attr
	  line.sub!(/^\s*/, '')
	  @header[attr] += $/ + line
	end
      end
  
      return unless line

      while line = mbox.gets()
        # From wybo@servalys.nl Sun Sep 26 13:20:51 2004 +0200
	if /^From .*? \w{3} \w{3} [\d ]{2} \d\d:\d\d:\d\d \d{4}/=~line
          @@from = line.chomp # save From-line for next Mail#new call
          break
        end
	@body.push(line)
      end
    ensure
      mbox.close if opened
    end
  end

  # return the header as a hash with header field names as keys and
  # header field contents as values. The values for fields with
  # continuation lines contain one multiple lines.

  def header
    return @header
  end

  # return the body as an array of lines

  def body
    return @body
  end

  # return a header field

  def [](field)
    @header[field.capitalize]
  end

  # Return a +Mail+ object, ready to print, including its `+From+ ' line.
  # Header fields will appear sorted, but +Date+, +From+, +Subject+ and +To+ 
  # go in front; note that of fields that occurred multiply times in the
  # original (like +Received+), only the last one will be reproduced here

  def assemble
    prior = %w{Date From Subject To}
    s = @from + $/
    self.header.keys.sort { |a,b|
      if prior.index(a)
        prior.index(b) ? a <=> b : -1
      elsif prior.index(b)
        1
      else
        a <=> b
      end
    }.each { |k|
      s << "#{k}: #{self.header[k].gsub(/#{$/}/,$/+"\t")}" + $/
    }
    s << $/
    s << self.body.join
  end

end

In This Thread