[#11822] RCR: Input XML support in the base Ruby — Dave Thomas <Dave@...>

15 messages 2001/03/01

[#11960] Not Ruby, for me, for the moment at least — "Michael Kreuzer" <mkreuzer@... (nospam)>

I wrote on this newsgroup last weekend about how I was considering using

11 messages 2001/03/04

[#12023] French RUG ? — "Jerome" <jeromg@...>

Hi fellow rubyers,

16 messages 2001/03/05

[#12103] disassembling and reassembling a hash — raja@... (Raja S.)

Given a hash, h1, will the following always hold?

20 messages 2001/03/06

[#12204] FEATURE REQUEST: 'my' local variables — Leo Razoumov <see_signature@127.0.0.1>

Ruby is, indeed, a very well designed language.

64 messages 2001/03/07
[#12250] Re: FEATURE REQUEST: 'my' local variables — Leo Razoumov <see_signature@127.0.0.1> 2001/03/07

>>>>> "GK" == GOTO Kentaro <gotoken@math.sci.hokudai.ac.jp> writes:

[#12284] Re: FEATURE REQUEST: 'my' local variables — gotoken@... (GOTO Kentaro) 2001/03/08

In message "[ruby-talk:12250] Re: FEATURE REQUEST: 'my' local variables"

[#12289] Re: FEATURE REQUEST: 'my' local variables — matz@... (Yukihiro Matsumoto) 2001/03/08

Hi,

[#12452] Re: FEATURE REQUEST: 'my' local variables — gotoken@... (GOTO Kentaro) 2001/03/12

In message "[ruby-talk:12289] Re: FEATURE REQUEST: 'my' local variables"

[#12553] Re: FEATURE REQUEST: 'my' local variables — Dave Thomas <Dave@...> 2001/03/13

matz@zetabits.com (Yukihiro Matsumoto) writes:

[#12329] Math package — Mathieu Bouchard <matju@...>

18 messages 2001/03/09

[#12330] Haskell goodies, RCR and challenge — Robert Feldt <feldt@...>

Hi,

19 messages 2001/03/09
[#12374] Re: Haskell goodies, RCR and challenge — matz@... (Yukihiro Matsumoto) 2001/03/10

Hi,

[#12349] Can Ruby-GTK display Gif Png or Jpeg files? — Phlip <phlip_cpp@...>

Ruby-san:

20 messages 2001/03/09

[#12444] class variables — Max Ischenko <max@...>

14 messages 2001/03/12

[#12606] Order, chaos, and change requests :) — Dave Thomas <Dave@...>

17 messages 2001/03/14

[#12635] email address regexp — "David Fung" <dfung@...>

i would like to locate probable email addresses in a bunch of text files,

12 messages 2001/03/14

[#12646] police warns you -- Perl is dangerous!! — Leo Razoumov <see_signature@127.0.0.1>

I just read this story on Slashdot

14 messages 2001/03/14
[#12651] Re: police warns you -- Perl is dangerous!! — pete@... (Pete Kernan) 2001/03/14

On 14 Mar 2001 11:46:35 -0800, Leo Razoumov <see_signature@127.0.0.1> wrote:

[#12691] Re: police warns you -- Perl is dangerous!! — "W. Kent Starr" <elderburn@...> 2001/03/15

On Wednesday 14 March 2001 15:40, Pete Kernan wrote:

[#12709] [OFFTOPIC] Re: police warns you -- Perl is dangerous!! — Stephen White <spwhite@...> 2001/03/16

On Fri, 16 Mar 2001, W. Kent Starr wrote:

[#12655] Re: FEATURE REQUEST: 'my' local variables — "Benjamin J. Tilly" <ben_tilly@...>

>===== Original Message From Leo Razoumov <see_signature@127.0.0.1> =====

18 messages 2001/03/14

[#12706] Library packaging — "Nathaniel Talbott" <ntalbott@...>

I have a project that I'm working on that needs to live two different lives,

30 messages 2001/03/16

[#12840] Looking for a decent compression scheme — Dave Thomas <Dave@...>

14 messages 2001/03/19

[#12895] differences between range and array — "Doug Edmunds" <dae_alt3@...>

This code comes from the online code examples for

16 messages 2001/03/20
[#12896] Re: differences between range and array — "Hee-Sob Park" <phasis@...> 2001/03/20

[#12899] Re: differences between range and array — Jim Freeze <jim@...> 2001/03/20

On Tue, 20 Mar 2001, Hee-Sob Park wrote:

[#12960] TextBox ListBox — Ron Jeffries <ronjeffries@...>

Attached is a little Spike that Chet and I are doing. It is a

13 messages 2001/03/20

[#12991] [ANN] Lapidary 0.2.0 — "Nathaniel Talbott" <ntalbott@...>

Well, here's my first major contribution to the Ruby world: Lapidary. It's a

16 messages 2001/03/20

[#13028] mkmf question — Luigi Ballabio <luigi.ballabio@...>

15 messages 2001/03/21

[#13185] Reading a file backwards — "Daniel Berger" <djberg96@...>

Hi all,

21 messages 2001/03/25
[#13197] Re: Reading a file backwards — "Daniel Berger" <djberg96@...> 2001/03/25

> Hi Dan,

[#13203] Re: Reading a file backwards — Mathieu Bouchard <matju@...> 2001/03/25

On Sun, 25 Mar 2001, Daniel Berger wrote:

[#13210] Re: Reading a file backwards — "Daniel Berger" <djberg96@...> 2001/03/25

"Mathieu Bouchard" <matju@sympatico.ca> wrote in message

[#13374] Passing an array to `exec'? — Lloyd Zusman <ljz@...>

I'd like to do the following:

15 messages 2001/03/31

[#13397] Multidimensional arrays and hashes? — Lloyd Zusman <ljz@...>

Is it possible in ruby to make use of constructs that correspond to

14 messages 2001/03/31

[ruby-talk:13199] Re: Reading a file backwards

From: Bob Kline <bkline@...>
Date: 2001-03-25 16:53:31 UTC
List: ruby-talk #13199
On Sun, 25 Mar 2001, Daniel Berger wrote:

> Ah, crud - I forgot to mention that I *do not* want to read the
> entire file into memory, which the readlines method appears to do
> (unless I'm mistaken). I also would like to avoid tricks like
> creating temporary files.

Here's the logic (but not the ruby code - I'm not that familiar with the
language) for a two-pass approach:

   Scan the file, building an array of file positions and lengths for
      each line (an array of tuples)
   Start at the back of the array and move toward the front
   For each position in the array:
      Seek to the file position
      Read the line
      Process the line

Here's the logic for a somewhat more complicated, but possibly more
efficient one-pass approach:

   Seek to the end of the file
   Decide on a block size you can afford to keep in memory (you may
      have to keep more in memory if you run into strings longer
      than your blocksize or which straddle blocks)
   Back up the distance equal to this block size
   Read a block
   Scan backwards from the end of the block, skipping past the line-end
      character (or pair, depending on your platform) at the end of the
      block
   For each line [see note below]:
      Extract the substring for the line
      Process the line
   Save any leftover partial line at the beginning of the block
   While more data:
      Back up to the position of the prior block (or BOF if partial block)
      Read the block
      Append any leftover from earlier processing
      Scan backwards from the end of the block
      For each line in the block:
         Extract the substring for the line
         Process the line
      Save any leftover partial line at the beginning of the block
   If any leftover:
      process the leftover (which represents the first line in the file)

Recognition of a line in a block when you are using the second method
occurs when you encounter the line-termination character (or pair,
depending on your platform).  Extract the data following the
line-termination character(s) as the substring for the line.

For both algorithms, you need to open the file in binary mode, to avoid
getting bogus results when you ask for or specify file positions.
Handling all the line-end combinations can be tricky, because different
platforms use different conventions, and the real world will hand you
files that present some anomalies.  For example, Macintosh uses CR
(ASCII 13) as the line-end character.  UNIX uses LF (ASCII 10).  VAX,
CPM, DOS, and Windows all use CR followed by LF.  One too-common anomaly
is the presence of extra CR characters preceding the CR+LF pair.  Until
you see the LF you'll be tempted to think you're dealing with a Mac
file.  One way of dealing with this is to read from the beginning of the
file until you have seen at least one CR and/or LF character followed by
at least one non-CR/LF.  If only CR was seen, set an eol variable to
"\r"; if only LF was seen, set the variable to "\n"; if a CR was
followed by LF, set the variable to "\r\n"; if both were seen but the
order was LF followed by CR, you might set the variable to "\n\r"
(though this isn't a legitimate combo on any platform I know of). Then
back up and use your variable to recognize line-ends canonical to the
platform you've decided you're handling, treating the anomolous strays
as part of the line itself.  For this peek-ahead in which you attempt to
determine which platform you're dealing with, you also have to handle
the border condition of a file which consists only of CR and LF
characters.

There's a third approach which is a variation on the first, and which
avoids all the hand-wringing over end-of-line conventions by letting the
underlying I/O libraries handle the problem:

   Open the file in text mode
   Set pos variable to zero
   While more lines:
      Read and discard the line
      Append value of pos value to array
      Store the current position of the file in pos
   Process the lines using the position array as in the first option

In this case you're paying the price of a small amount of additional
overhead (making a copy of each line in the first pass) for a measurable
increase in simplicity, robustness, and portability.  And you don't need
to store line lengths in this version, either.  This third approach is
the one I recommend.

Hope this helps.

[PS: I dropped ruby-talk@netlab.co.jp from the address list; I don't
know what the conventions are for the ruby mailing list, but I didn't
want to fall into inadvertant cross-posting, which usually invites
flames.]

-- 
Bob Kline
mailto:bkline@rksystems.com
http://www.rksystems.com

In This Thread