[#6660] Ruby on Neko ? — Nicolas Cannasse <ncannasse@...>

Hi folks,

14 messages 2005/11/19

[#6672] testing for hardlink with "test(?-, ...)" flawed on Windows — noreply@...

Bugs item #2858, was opened at 2005-11-20 16:35

13 messages 2005/11/20

[#6684] semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...>

Hi all,

81 messages 2005/11/21
[#6685] Re: semenatics of if/unless/while statement modifiers — Mauricio Fern疣dez <mfp@...> 2005/11/22

On Tue, Nov 22, 2005 at 08:22:59AM +0900, Stefan Kaes wrote:

[#6686] Re: semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...> 2005/11/22

Mauricio Fern疣dez wrote:

[#6687] Re: semenatics of if/unless/while statement modifiers — Eric Hodel <drbrain@...7.net> 2005/11/22

On Nov 21, 2005, at 4:37 PM, Stefan Kaes wrote:

[#6689] Re: semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...> 2005/11/22

Eric Hodel wrote:

[#6693] Re: semenatics of if/unless/while statement modifiers — Yukihiro Matsumoto <matz@...> 2005/11/22

Hi,

[#6695] Re: semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...> 2005/11/22

Yukihiro Matsumoto wrote:

[#6718] Re: semenatics of if/unless/while statement modifiers — mathew <meta@...> 2005/11/22

[#6722] Re: semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...> 2005/11/22

mathew wrote:

[#6707] Re: semenatics of if/unless/while statement modifiers — "David A. Black" <dblack@...> 2005/11/22

Hi --

[#6708] Re: semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...> 2005/11/22

David A. Black wrote:

[#6714] Re: semenatics of if/unless/while statement modifiers — "David A. Black" <dblack@...> 2005/11/22

Hi --

[#6717] Re: semenatics of if/unless/while statement modifiers — Stefan Kaes <skaes@...> 2005/11/22

David A. Black wrote:

[#6798] ruby 1.8.4 preview2 — Yukihiro Matsumoto <matz@...>

Hi,

37 messages 2005/11/30

Re: [RFC] Method of feeding input to regexp matching

From: "David A. Black" <dblack@...>
Date: 2005-11-04 12:02:44 UTC
List: ruby-core #6568
Hi --

On Fri, 4 Nov 2005, Nikolai Weibull wrote:

> David A. Black wrote:
>
>> On Fri, 4 Nov 2005, Nikolai Weibull wrote:
>
>>> David A. Black wrote:
>
>>>> I'm thinking of cases like this:
>>>>
>>>>   re = /abc.*def/
>>>>
>>>> The first chunk out of the file might match this -- but then you'd
>>>> have to keep going, really until EOF, to get the greedy match if
>>>> it's there.  Then you'd have to go back.
>
>>> Well, think of it like this instead.  The Regexp simply reads from
>>> the input source when it needs more data.  The Regexp will
>>> concatenate the new data with the old and continue on its matching
>>> routine.  We build the input as we go along, i.e., we’re in a sense
>>> dealing with implementing lazy Strings.  This won’t cause any issues
>>> with backtracking, as the data will still be there.
>
>> In the /abc.*def/ case, though, you'd always have to take all the
>> input (at least up to the third-to-last character in the file), even
>> if you had an intermediate match.  So "needs more data" would not be
>> something the regex could tell you.  It would say, "Yes, there's a
>> match", but you would have to know that the "yes" didn't mean you
>> could stop.
>
> .* needs more data until there is no more data (#read returns nil), then
> it fails as it hasn’t been able to match 'def' and backtracks until that
> part of the regex does.  Then it has a match.  (This ignores newline
> conventions, but let’s ignore them for now.)  You have the same problem
> when doing this on a regular string.

I understand what .* means :-)  My point is that you would have to
have some mechanism for telling the file stream that the regex was
doing a greedy .* and therefore needed more.  I'm not saying it's
impossible -- just that it changes the whole concept of the state of a
regex.

>> But if the regex were /abc.*?def/, then as soon as there was a "yes",
>> you could stop.
>
>> There's also a question of: if the first 4096 bytes started with "abc"
>> and ended with "de", then you'd add the next 4096 -- but you'd have to
>> perform the match again.  Or else you'd have to know to rewind by
>> exactly two characters.  But if you're changing where you start the
>> match, that could affect how anchors worked.
>
> Why?  If the first character in the next 4096 bytes is a "f" we’d have a
> match.

Again, I understand that def matches /def/ :-)  But I'm
troubleshooting the process by which you would determine that you have
a match.  Illustrating with chunks of eight bytes:

   str = "abc123de"
   /abc.*?def/.match(str)  # false
   str = "abc123def1234567"
   /abc.*?def/.match(str)  # true

My point is that you've done the match twice from the beginning.  This
could get very inefficient, if you have to re-match constantly.

You'd probably have to arrange to store where the match failed (in the
regex, not in the string) and resume from there.  I'm not sure whether
that's possible in general, but I'm sure the answer is known.

> If we’re using .*? we’re are done.  If we’re using .* we
> wouldn’t have begun matching the "de" against the /de/.  What would be
> an issue would be how to treat MatchData#post_match.  It’d have to be
> the remaining data that wasn’t matched at the time of a match, not all
> the possible data that may come from the source.

That makes it a lot less transparent: post_match would be, in the def
example, 4095 bytes from a file.  That seems very "implementation
dependent".


David

-- 
David A. Black
dblack@wobblini.net

In This Thread