[#10209] Market for XML Web stuff — Matt Sergeant <matt@...>

I'm trying to get a handle on what the size of the market for AxKit would be

15 messages 2001/02/01

[#10238] RFC: RubyVM (long) — Robert Feldt <feldt@...>

Hi,

20 messages 2001/02/01
[#10364] Re: RFC: RubyVM (long) — Mathieu Bouchard <matju@...> 2001/02/05

[#10708] Suggestion for threading model — Stephen White <spwhite@...>

I've been playing around with multi-threading. I notice that there are

11 messages 2001/02/11

[#10853] Re: RubyChangeRequest #U002: new proper name for Hash#indexes, Array#indexes — "Mike Wilson" <wmwilson01@...>

10 messages 2001/02/14

[#11037] to_s and << — "Brent Rowland" <tarod@...>

list = [1, 2.3, 'four', false]

15 messages 2001/02/18

[#11094] Re: Summary: RCR #U002 - proper new name fo r indexes — Aleksi Niemel<aleksi.niemela@...>

> On Mon, 19 Feb 2001, Yukihiro Matsumoto wrote:

12 messages 2001/02/19

[#11131] Re: Summary: RCR #U002 - proper new name fo r indexes — "Conrad Schneiker" <schneik@...>

Robert Feldt wrote:

10 messages 2001/02/19

[#11251] Programming Ruby is now online — Dave Thomas <Dave@...>

36 messages 2001/02/21

[#11469] XML-RPC and KDE — schuerig@... (Michael Schuerig)

23 messages 2001/02/24
[#11490] Re: XML-RPC and KDE — schuerig@... (Michael Schuerig) 2001/02/24

Michael Neumann <neumann@s-direktnet.de> wrote:

[#11491] Negative Reviews for Ruby and Programming Ruby — Jim Freeze <jim@...> 2001/02/24

Hi all:

[#11633] RCR: shortcut for instance variable initialization — Dave Thomas <Dave@...>

13 messages 2001/02/26

[#11652] RE: RCR: shortcut for instance variable initialization — Michael Davis <mdavis@...>

I like it!

14 messages 2001/02/27

[#11700] Starting Once Again — Ron Jeffries <ronjeffries@...>

OK, I'm starting again with Ruby. I'm just assuming that I've

31 messages 2001/02/27
[#11712] RE: Starting Once Again — "Aaron Hinni" <aaron@...> 2001/02/27

> 2. So far I think running under TextPad will be better than running

[#11726] Re: Starting Once Again — Aleksi Niemel<zak@...> 2001/02/28

On Wed, 28 Feb 2001, Aaron Hinni wrote:

[ruby-talk:11792] Re: building n-grams

From: David Alan Black <dblack@...>
Date: 2001-02-28 17:50:16 UTC
List: ruby-talk #11792
On Thu, 1 Mar 2001, Arno Erpenbeck wrote:

> Greetings everybody,
> 
> maybe somebody can help me with this: How can I collect n-grams (i.e.
> tuples of characters/words/whatever) from plain text? I tried something
> like this:
> 
> while line = gets
>   line.gsub(/[a-zA-Z\s]{3,3}/) {|p| print "#{p},"}
> end
> 
> However, this makes "too big" steps because the regexp matches one
> triple and then the next one behind it, but no overlaps. There must be a
> simple solution I guess.
> 
> Example:
> Input "The man sees the boy with the telescope."
> Output "The, ma,n s,ees, th,e b,oy ,wit,h t,he ,tel,esc,ope,"
> Desired output "The,he ,e m, ma,man,..."

Quick first try (I know nothing about any ngram theory that may
exist, so I don't know whether certain things are right, such
as end-boundary behavior):

  class String
    def ngrams(len=1)
      ngrams = []
      (0..size - len).each do |n|
	 ng = self[n...n+len]
	 ngrams.push(ng)
	 yield ng if block_given?
       end
       ngrams
    end
  end

  str = "I am a string."
  p str.ngrams(5)
  str.ngrams(3) do |s| print "(%s)" % s end

=>

["I am ", " am a", "am a ", "m a s", " a st",
"a str", " stri", "strin", "tring", "ring."]

(I a)( am)(am )(m a)( a )(a s)( st)(str)(tri)(rin)(ing)(ng.)


> BTW: If this list is not intended for questions of this kind, please let
> me know, and I will go and look somewhere else.

We mainly use it to discuss the weather, but occasional interesting
questions related to the Ruby programming language are tolerated :-)


David

-- 
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web:  http://pirate.shu.edu/~blackdav

In This Thread