[#1816] Ruby 1.5.3 under Tru64 (Alpha)? — Clemens Hintze <clemens.hintze@...>

Hi all,

17 messages 2000/03/14

[#1989] English Ruby/Gtk Tutorial? — schneik@...

18 messages 2000/03/17

[#2241] setter() for local variables — ts <decoux@...>

18 messages 2000/03/29

[ruby-talk:02201] Re: Scripting and OO -- thought question

From: "David Douthitt" <DDouthitt@...>
Date: 2000-03-27 23:29:43 UTC
List: ruby-talk #2201
| <schneik@us.ibm.com> wrote:
| 
| David Douthitt writes:
| > I have a set of applications that were written
| > in Perl 4, which scan the UNIX system logs and generate color-coded
| > HTML pages for them.

| > All they do is scan the log (41,000 lines plus) and generate HTML
| > files based on them.
| 
| Yes, but HOW do they do this?

Well, the Ruby version used to do:

   LogFile.open.readlines.each { }

Now it does

   LogFile.open { |f| f.readlines.each { } }

Then it checks every line to see if the system name has changed,
and changes FONT color if it has.

| > At one time I had them (Ruby version, Perl
| > version) generating separate files for each system in the log;
| > when I switched to using ksh and grep, the speed increase was incredible.
| 
| Again, HOW was this done? (Also recall that you already know that Ruby can
| invoke grep too.) And what is incredible here? Was the speedup 50%? 100%?
| 500%?

It went from something like 3-6 minutes to about .3seconds.  I might note
that I used ksh ("grep foo file > file.out") inside a for-loop instead of
my Perl/Ruby variants.


| > I'm still stuck though, since scanning for one particular host
| > (with 41,000 lines!) can take over 3 minutes.
| 
| That doesn't seem (wild guess here) large enough to cause problems if you
| are running on a moderately fast machine. Is the run time pretty much a
| straight linear function of the number of lines of input?

I haven't taken statistical samples and done regression analysis :-)

| The basic problem here is that if you don't show people your (suitably
| sanitized if necessary) code (or at least representative critical pieces of
| your code), the answers you get to such questions will be based largely on
| people's imagination, which may or may not have anything to do with the
| most relevant factors in this case. (Maybe you should have been using Perl
| 5 with compiled regular expressions instead of Perl 4. Maybe you were
| somehow doing unnecessary or unbuffered I/O without realizing it. Maybe
| your were building some sort of table or index for your HTML output that
| inadvertently did something in an O(n**2) fashion. Maybe you stashed
| everything in memory on a machine with insufficient RAM. Maybe any number
| of other things....)

Behave!  Be nice!  :-)  You get out of bed on the wrong side this morning?
Remember to SMILE when you say that!  :-)

You think I want to RELEARN Perl from the ground up - corrupting my
Perl and OOP knowledge at the same time?  Yuck!

I noticed that HP-UX 11 STILL comes with Perl 4, not Perl 5.

Here is some code:

This following ruby code was replaced entirely by (ksh):

   for sys in $*
   do
      egrep ":.. [^ ]* (in\.|)$sys" $MESSAGES > ${sys}
   done

(MESSAGES=/var/log/messages)

And it FLIES!  Here is the ruby code:

#!/usr/bin/ruby

# Interesting problem reached here...
#
# ARGV is [ "arg1", "arg2", "arg3" ]
# results from a scan are [ [ "str1" ] ]
#
# Thus, ARGV.each is "arg1" ... "arg2" ... "arg3" ...
# scanXX.each is [ "str1" ] ... [ "str2 ] ...
#
# Thus ARGV[0] is "arg1" ; scanXX[0] is [ "str1" ] ;
# and scanXX[0][0] is "str1"
#
# This explains a lot.

systems = Array.new

class String
   def systemName
      self.scan("^... .. ..:..:.. ([^ ]*)")
   end
end

File.open("/var/log/messages").each { |line|
   line.chomp!
#  sys = line.scan("^... .. ..:..:.. ([^ ]*)")
   sys = line.systemName

   systems = systems | sys

   if not ARGV.include?(sys[0][0])
      print("    ", line, "\n")
   end
   }

# (systems.sort!).each { |sys|
#    print("   ", sys, "\n")
#    }

[.......end.......]

I thought about posting Perl code, but..... this is a RUBY mailing list...

#!/usr/bin/env ruby

#----------------------------------
#  CLASSES
#----------------------------------

require("Html.rb")
require("getopts")

# class FixNum
#    def format_color
#       format("#%04X", self)
#    end
# end

class Logs
   at_exit { Logs.end_body }

   def Logs.header (str = nil)
      Html.header {
         Html.title str
         }

   end

   def Logs.color_table (colors)
      Html.table {
         Html.table_row {
            colors.each { |machine, color|
               Html.table_data(color) {
                  print machine
                  }
               }
            }
         }
   end

   def Logs.date_heading(date)
      Html.named_anchor (date)
      Html.table (Colors::BREAKLINES, "100%") {
         Html.table_row {
            Html.table_data {
               print "&nbsp;&nbsp;"
                  Html.em {
                     Html.strong {
                        print date
                     }
                  }
               }
            Html.table_data(Colors::BREAKLINES, "RIGHT") {
               Html.anchor("Top", "#HTMLTop")
               }
            }
         }
   end
end

class OutputFile < File
   TMPDIR = "/tmp/log2html.rb"

   def OutputFile.open (sys)
      File.open(OutputFile::TMPDIR + "/" + sys + ".html", "w")
   end
end

class LogFile < File
   def LogFile.open
      if ($*[0] == nil)
         super("/var/log/messages")
      else
         super($*[0])
      end
   end

   def LogFile.unlink
      if ($*[0] != nil)
         super($*[0])
      end
   end

   def LogFile.copy
      f = File.expand_path $*[0]
      `/bin/cat #{f}`   # -- this is the speed up
   end
end

class String
   def log_fields
      self.chomp!

#  Interesting pattern: subsystem field (such as "in.identd[27062]:")
#     is not guaranteed to be present.  So on occasion, $5 == nil.

      self =~ /(.{6,6}) (.{8,8}) ([^ ]*) (([^[]+).*: (.*)|.*)$/
      [ $1, $2, $3, $4, $5 ]
   end
end

#----------------------------------
#  MAIN PROGRAMME
#----------------------------------

getopts("1s")

machines = {
           "sys0" => Colors.green,
           "sys1" => Colors.blue,
           "sys2" => Colors.purple,
           "sys3" => Colors.red,
           "sys4" => Colors.yellow,
           "sys5" => Colors.blue_green
           }

sysfiles = Hash.new

Html.html {
   Logs.header "System Logs"

   Html.body(Colors.buff) {
      Html.target "main"
      Html.named_anchor "HTMLTop"

      Html.heading "messages"

      print "This page was generated on "
      print Time.now.ctime

      Logs.color_table machines
      Html.paragraph

      if ($OPT_1)                # -- this is new
         Html.pre {              # -- this is new
            print LogFile.copy   # -- this is new
            }                    # -- this is new
      else
         old_date = ""
         old_sys = ""

         LogFile.open { |f|
            f.readlines.each { |line|
               date, time, sys, entry, subsys = line.log_fields

               if date != old_date
                  Html.end_pre

                  Logs.date_heading date

                  Html.begin_pre
                  Html.font (machines[sys])   # not always needed.... but is at first
               else
                  if sys != old_sys
                     Html.font (machines[sys])
                  end
               end

               print line, "\n"

               old_date = date
               old_sys = sys
               }
            }
      end
      }
   }

The box says this:

# uname -a
Linux mysys.nowhere.nope.nyet.nada.zip.zilch.nicht 2.2.9-27mdk #1 Mon Jun 14 16:44:05 CEST 1999 i586 unknown

It's a Compaq Prolinea 5100e - 100MHz Pentium.  32Megs of memory, 600M disk
(yes, the disk fills up routinely :-)

Now aren't you glad you asked for code?  :-)


In This Thread

Prev Next