[#70464] ljust, rjust... — "Chris Pine" <nemo@...>

Just thought I would run these ideas by everyone:

11 messages 2003/05/01

[#70502] temporary redirection of stdout — Andrew Walrond <andrew@...>

I'm new to ruby, so forgive any obvious stupididity, but can anyone

20 messages 2003/05/02

[#70535] SWIG on Solaris problem — Jim Freeze <jim@...>

Hi folks.

14 messages 2003/05/02

[#70594] Why is PHP so popular? What can we learn from the PHP camp? — ptkwt@...1.aracnet.com (Phil Tomson)

....and what can we learn from PHP's rapid rise to success?

99 messages 2003/05/05
[#70641] Re: Why is PHP so popular? What can we learn from the PHP camp? — Andreas Schwarz <usenet@...> 2003/05/05

Aredridel wrote:

[#70652] A wishlist for a "Ruby Standard Library" — Aredridel <aredridel@...> 2003/05/05

A wishlsit for a "Ruby Standard Library":

[#70655] Re: A wishlist for a "Ruby Standard Library" — Brian Candler <B.Candler@...> 2003/05/05

On Tue, May 06, 2003 at 07:39:54AM +0900, Aredridel wrote:

[#70673] Re: A wishlist for a "Ruby Standard Library" — Mark Wilson <mwilson13@...> 2003/05/06

[snipped many wonderful things.]

[#70759] Testing for a class existence — "Gennady" <gfb@...>

Does anybody know an easy way to test for a class/module existence in =

15 messages 2003/05/06

[#70770] capture output — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I have seen much talking about this topic, but no working code!

68 messages 2003/05/06
[#70929] Re: IO.pipe + thread = hangs (was: Re: capture output) — "Robert Klemme" <bob.news@...> 2003/05/08

[#71741] Named Pipes — Mark Firestone <nedry@...> 2003/05/19

What is the recommended procedure for using named pipes in Ruby. Does one

[#71745] Re: Named Pipes — Brian Candler <B.Candler@...> 2003/05/19

On Mon, May 19, 2003 at 06:33:17PM +0900, Mark Firestone wrote:

[#70842] Symbiosis offer: trade Ruby for German :-) — Mauricio Fern疣dez <batsman.geo@...>

17 messages 2003/05/07

[#70865] access a variables name? — "meinrad.recheis" <my.name.here@...>

is it possible to access the variable-name of an object?

14 messages 2003/05/07

[#70891] Syck 0.25 + YAML.rb -- Objects in plain-text — why the lucky stiff <ruby-talk@...>

..my faithful friends..

20 messages 2003/05/07

[#70919] petition for raa-install to be included in 1.8 — ptkwt@...1.aracnet.com (Phil Tomson)

Similar to the YamlInRuby petition which has now closed.

14 messages 2003/05/08
[#70920] Re: petition for raa-install to be included in 1.8 — Sam Roberts <sroberts@...> 2003/05/08

I just looked again, and remember why I don't know anything about

[#70921] Re: petition for raa-install to be included in 1.8 — why the lucky stiff <ruby-talk@...> 2003/05/08

You can find a tutorial on using raa-install (as well as its API) at:

[#70985] Can a global be a constant? — Jim Freeze <jim@...>

Hi

36 messages 2003/05/08
[#71001] Re: Can a global be a constant? — "Hal E. Fulton" <hal9000@...> 2003/05/08

----- Original Message -----

[#71003] Re: Can a global be a constant? — Jim Freeze <jim@...> 2003/05/08

On Friday, 9 May 2003 at 8:23:52 +0900, Hal E. Fulton wrote:

[#71007] Re: Can a global be a constant? — dblack@... 2003/05/08

Hi --

[#71036] Re: Regexp: why does (re)* return only last repetition? — "Robert Klemme" <bob.news@...>

21 messages 2003/05/09
[#71209] Re: Regexp: why does (re)* return only last repetition? — "Robert Klemme" <bob.news@...> 2003/05/12

[#71225] Re: Regexp: why does (re)* return only last repetition? — Austin Ziegler <austin@...> 2003/05/12

On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:

[#71229] Re: Regexp: why does (re)* return only last repetition? — Brian Candler <B.Candler@...> 2003/05/12

On Mon, May 12, 2003 at 10:18:00PM +0900, Austin Ziegler wrote:

[#71266] Re: Regexp: why does (re)* return only last repetition? — Austin Ziegler <austin@...> 2003/05/12

On Mon, 12 May 2003 23:51:44 +0900, Brian Candler wrote:

[#71042] TCP Sockets — Dominik Werder <dwerder@...>

Hi there,

28 messages 2003/05/09
[#71089] Re: TCP Sockets — Tom Felker <tcfelker@...> 2003/05/09

On Fri, 2003-05-09 at 05:40, Dominik Werder wrote:

[#71543] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

>> How can I tell how many bytes can be read from an IO object without

[#71547] Re: TCP Sockets — Brian Candler <B.Candler@...> 2003/05/16

On Fri, May 16, 2003 at 05:14:17PM +0900, Dominik Werder wrote:

[#71550] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

my problem is not the http protocol itself (not at this time :) but the IO-

[#71551] Re: TCP Sockets — Brian Candler <B.Candler@...> 2003/05/16

On Fri, May 16, 2003 at 07:20:30PM +0900, Dominik Werder wrote:

[#71553] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

> Maybe, but threads are really the "ruby way" to solve this problem.

[#71557] Re: TCP Sockets — Brian Candler <B.Candler@...> 2003/05/16

On Fri, May 16, 2003 at 07:53:39PM +0900, Dominik Werder wrote:

[#71562] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

> That would mean mixing the binary streams in a non-deterministic way,

[#71107] RCR for child execution — Brian Candler <B.Candler@...>

Looking on RubyGarden it seems that the RCR process there is "resting", so

99 messages 2003/05/10
[#71122] Re: RCR for child execution — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/10

On Sun, 11 May 2003 01:50:49 +0900, Brian Candler wrote:

[#71126] Re: RCR for child execution — Brian Candler <B.Candler@...> 2003/05/10

On Sun, May 11, 2003 at 01:27:31AM +0900, Simon Strandgaard wrote:

[#71364] Re: RCR for child execution — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/13

On Tue, 13 May 2003 21:11:08 +0000, ahoward wrote:

[#71385] Re: RCR for child execution — matz@... (Yukihiro Matsumoto) 2003/05/14

Hi,

[#71152] Is Rubygarden's wiki restricted to English? — Mauricio Fern疣dez <batsman.geo@...>

20 messages 2003/05/11
[#71160] Re: Is Rubygarden's wiki restricted to English? — "Hal E. Fulton" <hal9000@...> 2003/05/11

----- Original Message -----

[#71165] Re: Is Rubygarden's wiki restricted to English? — Mauricio Fern疣dez <batsman.geo@...> 2003/05/11

On Mon, May 12, 2003 at 12:40:26AM +0900, Hal E. Fulton wrote:

[#71189] efficiency advice needed — "meinrad.recheis" <my.name.here@...>

hi,

12 messages 2003/05/11

[#71297] State Pattern Implementation — "Robert Klemme" <bob.news@...>

22 messages 2003/05/13

[#71361] Objects VS Datastructures — Simon Vandemoortele <deliriousREMOVEUPPERCASETEXTTOREPLY@...>

19 messages 2003/05/13

[#71447] Embedding/GC/heap corruption problem — "Jan Bernhardt" <j.bernhardt@...>

Hi,

22 messages 2003/05/14

[#71488] Test::Unit sequencing — Brian Candler <B.Candler@...>

A question for more experienced Test::Unit users.

23 messages 2003/05/15
[#71492] Re: Test::Unit sequencing — Anders Bengtsson <ndrsbngtssn@...> 2003/05/15

--- Brian Candler <B.Candler@pobox.com> wrote:

[#71508] Re: Test::Unit sequencing — ahoward <ahoward@...> 2003/05/15

On Thu, 15 May 2003, [iso-8859-1] Anders Bengtsson wrote:

[#71510] RCR: $INCLUDED global var — martindemello@... (Martin DeMello)

$INCLUDED = (__FILE__ != $0)

25 messages 2003/05/15
[#71515] Re: RCR: $INCLUDED global var — matz@... (Yukihiro Matsumoto) 2003/05/15

Hi,

[#71525] Re: RCR: $INCLUDED global var — ahoward <ahoward@...> 2003/05/15

On Fri, 16 May 2003, Yukihiro Matsumoto wrote:

[#71520] public/protected/private syntax — Guillaume Marcais <guslist@...>

I tend to find the public/protected/private keywords in Ruby a little odd.

27 messages 2003/05/15
[#71540] Re: public/protected/private syntax — "Robert Klemme" <bob.news@...> 2003/05/16

[#71573] Re: public/protected/private syntax — Guillaume Marcais <guillaume.marcais@...> 2003/05/16

On Friday 16 May 2003 03:38 am, you wrote:

[#71595] Re: public/protected/private syntax — Austin Ziegler <austin@...> 2003/05/16

On Fri, 16 May 2003 23:33:21 +0900, Guillaume Marcais wrote:

[#71560] gzip cgi compression — Dominik Werder <dwerder@...>

Is zlib compatible with HTTP-gzip-output-compression?

14 messages 2003/05/16

[#71636] select strange behavier — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

'select' is suppose to watch some file-descriptors and when an event

22 messages 2003/05/17

[#71673] An Object Going Out Of Scope — "vinita Papur" <gkapur@...>

A quick question. How can one discern when an object goes out of scope?

46 messages 2003/05/18
[#71678] Re: An Object Going Out Of Scope — "MikkelFJ" <mikkelfj-anti-spam@...> 2003/05/18

[#71680] Re: An Object Going Out Of Scope — Mauricio Fern疣dez <batsman.geo@...> 2003/05/18

On Sun, May 18, 2003 at 06:08:43PM +0900, MikkelFJ wrote:

[#71681] ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

i need this for a realtime game application which has embedded ruby -- after

[#71683] Re: ruby garbage collection — Mauricio Fern疣dez <batsman.geo@...> 2003/05/18

On Sun, May 18, 2003 at 08:35:11PM +0900, Gaffer wrote:

[#71685] Re: ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

strange, i found the rb_gc call on my own and called that to good effect

[#71688] Re: ruby garbage collection — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/18

On Sun, 18 May 2003 22:10:18 +0900, Gaffer wrote:

[#71689] Re: ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

i think its actually the GC cleaning up matrix and vector classes (my own

[#71691] Re: ruby garbage collection — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/18

On Sun, 18 May 2003 22:39:17 +0900, Gaffer wrote:

[#71692] Re: ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

i'm pretty sure i've tracked down the cause, this is my first time embedding

[#71695] Re: ruby garbage collection — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/18

On Sun, 18 May 2003 23:48:28 +0900, Gaffer wrote:

[#71948] How I'd like method-wrapping to work... — "Hal E. Fulton" <hal9000@...>

OK, I read Matz's blog entries as well as I could.

16 messages 2003/05/21

[#72030] why is "does" missing from this sub!-stitution? — Dave Oshel <dcoshel@...>

[~/Desktop] dave$ cat foobar.rb ; foobar.rb

19 messages 2003/05/22
[#72037] Re: why is "does" missing from this sub!-stitution? — Dave Oshel <dcoshel@...> 2003/05/22

In article <20030522202818.GA24497@student.ei.uni-stuttgart.de>,

[#72056] Naive CGI question — "Hal E. Fulton" <hal9000@...>

I'm betting this is either impossible

15 messages 2003/05/23

[#72134] Problem compiling extension on Solaris — "Tim Hunter" <cyclists@...>

I have an user who is trying to build RMagick on Solaris with Ruby 1.6.8.

22 messages 2003/05/25
[#72262] Re: Problem compiling extension on Solaris — Daniel Berger <djberge@...> 2003/05/27

[#72150] Binary Tree vs. Hash — Xiangrong Fang <xrfang@...>

Hi ruby fans,

47 messages 2003/05/26

[#72184] Project Directory Structure — Jim Freeze <jim@...>

Hi:

47 messages 2003/05/26
[#72218] Re: Project Directory Structure — "MikkelFJ" <mikkelfj-anti-spam@...> 2003/05/26

[#72222] Re: Project Directory Structure — Jim Freeze <jim@...> 2003/05/26

Thanks everyone for your input so far.

[#72244] Re: Project Directory Structure — Robert Feldt <feldt@...> 2003/05/27

On Tue, 27 May 2003, Jim Freeze wrote:

[#72260] Re: Project Directory Structure — Jim Freeze <jim@...> 2003/05/27

On Tuesday, 27 May 2003 at 18:26:53 +0900, Robert Feldt wrote:

[#72265] Re: Project Directory Structure — Jim Freeze <jim@...> 2003/05/27

Thanks for all the input. A description of the Project

[#72269] Re: Project Directory Structure — Robert Feldt <feldt@...> 2003/05/27

On Wed, 28 May 2003, Jim Freeze wrote:

[#72274] RCR: unpack/pack Bignum — Robert Feldt <feldt@...>

I'm sure this has been discussed before and maybe there are good reasons

27 messages 2003/05/27
[#72375] Re: RCR: unpack/pack Bignum — Robert Feldt <feldt@...> 2003/05/28

No one seems to be interested in this issue so I'll have to reply to

[#72381] Re: RCR: unpack/pack Bignum — nobu.nokada@... 2003/05/28

Hi,

[#72394] Re: RCR: unpack/pack Bignum — Robert Feldt <feldt@...> 2003/05/29

On Thu, 29 May 2003 nobu.nokada@softhome.net wrote:

[#72403] Re: RCR: unpack/pack Bignum — nobu.nokada@... 2003/05/29

Hi,

[#72600] What is BER compression? (was RCR: unpack/pack Bignum) — Sam Roberts <sroberts@...> 2003/05/31

Is it documented anywhere, what this 'w' template is useful for?

[#72371] Windows Installer for Ruby 1.8.0 (CVS) — Andrew Hunt <andy@...>

Hi all,

16 messages 2003/05/28

[#72388] Array.extend versus instance.extend — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I want to install 'shift_until_kind_of' in the global Array class

18 messages 2003/05/29

[#72420] Metakit for Ruby - Would you want it? — bobx@... (Bob)

I have a gentleman in England who I have been talking with who is

23 messages 2003/05/29

[#72439] Iteration - last detection — "Orion Hunter" <orion2480@...>

Is there any built in functionality for iteration that will allow me to

41 messages 2003/05/29
[#72510] Re: Iteration - last detection — Carlos <angus@...> 2003/05/30

> Is there any built in functionality for iteration that will allow me to

[#72577] IF statement in ruby 1.8.0 (2003-05-26) [i386-mswin32] — "Shashank Date" <sdate@...>

Just when I thought that I had perfectly understood the IF statement in

14 messages 2003/05/31

Re: Binary Tree vs. Hash

From: "MikkelFJ" <mikkelfj-anti-spam@...>
Date: 2003-05-27 21:42:17 UTC
List: ruby-talk #72282
"Xiangrong Fang" <xrfang@hotmail.com> wrote in message
news:20030527092309.73C4.XRFANG@hotmail.com...
> Hi Mikkel,
>
> Thanks for the detailed explanations. I have some qusestions regarding
> your explain:
>
> 1. You mentioned B-Tree and binary tree, seems that they are different?

In principle they are identical, but differ vastly in implementation.

A binary tree has two children, the left has smaller values, the right has
larger values.
A B-Tree has an array of children such that all values in the leftmost
subtree
is smaller than all values in the next subtree, etc.
In effect a B-Tree is a compressed binary tree.
In both cases you have logarithmic search. But if you assume the time is
consumed
by visiting new nodes and not by visiting values within a single B-Tree
node,
you get a much faster operation. This assumption is true if you read data
from
disk. There is a limit of B-Tree node sizes beyond which it takes
too long time to visit each key within each node. For disk based
B-Trees you usually have about 1000 keys (M=1000).
You can also use B-Trees in-memory, but then the node size must
reflect the size of the CPU cache. When data is in cache, a very
fast scan over the array of keys is more effecient than visiting
new nodes.
B-Trees are very complicated because a node can have between M / 2 and M
keys in a node. If you go above or below you must reorginize the tree,
This is efficient, but the implementation has lots of special cases.

A B-Tree guarantees that you must at most visit K nodes.
K is a small limited value. If K is 3 and you have 1000 keys per
node (M=1000), you have K^M = 1G possible keys.
That is about all you can address in a 32bit systems.
For in-memory operation (M=10), K is significantly larger,
but if you consider the realistic size of your dataset, it is still
small. The trick you can do to make in memory performance better
is to organize you tree so allocation of nodes happen close to
related nodes. This gives you benefits in lower level cache and
virtual memory. Effectively you a split a larger (M=1000) into
several smaller (M=10) nodes. This gives you both good in-memory
and on-disk performance. In praxis it is not that easy to manage,
but you can do something to make it work reasonably well.

Binary trees on the other hand do not have a small limit on the
number of nodes that you must visit. Therefore they truly have
logarithmic behavior. It's not too bad, it is just not the most
efficient way to deal with large datasets.

A binary tree requires you to visit a new node for each comparison.
Binary trees must be balanced otherwise all data could end up in one
subtree (becoming just a linked list).
There are several ways to balance binary trees (e.g. red-black trees and
splay trees).
In any case it solves the problem.


> 2. I didn't used any of my own hash algorithm, I just used Ruby's hash,
> and used Ruby's PStore to dump it to disk. The memory usage is an
> estimation from monitoring the task list in win2k. Do you have any
> comments about the efficiency of using Ruby's hash and pstore?

My best guess is that PStore would not save more than necessary, so you
should
expect in-memory to take up more space as I already suggested,
but I don't know what it does.
Your way of measuring is very crude, but probably fair because this is how
the
system is actually stressed no matter the theory.

> 3. My application is related to using N-GRAM method to cut Chinese text
> into words (a Chinese bigram is 4-byte string). Do you have any comments
> on which algorithm is better?

I'd like to here more about this, in private mail if you prefer.

It it appears that you are counting "word" or symbol frequencies and can do
this
by indexing a pair of symbols as a single 32bit word.
This is ideal for the hashtable I have implemented as it does not need to
visit any
external keys. However, it is not available in Ruby. I can send you the
C-Source.
I already suggested that this source be made available to Ruby but I don't
know how
well it fits with Ruby as is. It would be specialized for 32 bit words, not
general hashes.

I have been working a lot with B-Trees and I think I would choose B-Trees
over
hash tables because they scale better. B-Trees have a good trade-off. They
are easy to have
small and easy to have very very large and if they are not the fastest
solution, they are
always seem to be among the best performers.

With B-Trees you can have your data on disk and do efficient lookups without
needing
to load everything into memory.
It is possible to have on-disk hash tables, but in the end you need to do
some clever
memory mananagement that is going to make it look an awful lot like B-Trees.

Also, because the key is 32bit the B-Tree is efficient because it does not
need to
visit external keys. As soon as you visit external keys the cache benefit of
B-Trees
goes away. However, I do not recommend that you start implementing B-Trees.
They are
difficult to implement and difficult to make sure that they work.

You may be able to use the datastructures in the Berkeley DB database. I
don't know how
efficient it would be in you case (it's fast but it is a database after
all).
Berkeley DB does handle hashes and B-Trees. It can operate in in-memory mode
only
and I think there is a Ruby interface.
Note that Berkeley is free only for non-commerical usage.

> I am very interested in your comments
> about SkipList, can you expand more?

I don't recall exactly why skip-lists are better than binary trees, but they
are
in any case easy to implement (except efficient node allocation).

I can send you an implementation in C if you like.

This is only a rough outline according to memory:

A skiplist is a sorted linked list of all values.
Since it is slow to find a value this way, some nodes both point to the next
node
as well as to one or more nodes further down the list.
In fact a skiplist node has an array of next pointers.
All nodes have on next pointer, half the nodes have at least two next
pointers.
Only one nodes has log(N) next pointers (the head of the list).
Two nodes has log(N-1) next pointers, etc.
The search happens by looking at the longest jumping link first. If that was
too far,
you try the next shorter jumping node.
Eventually you find a node that has a key that matches or is shorter.
You now repeat the process jumping from that node. This goes on until you
have a match
(or conclude failure).
This is a bit simplified as there are a few more cases to consider, but this
is generally
the idea.
In reality the skiplist does not have a perfect distribution of nodes. The
number of nodes
with a certain number of next links may not be optimal, and the location of
these nodes may
also not be optimal. The distribution happens statistically, but works
pretty well in praxis.
A skiplist node thus has an array of pointers. This is not unlike B-Tree
nodes, except the
concept is radically different. There is no complex balancing or splitting
of nodes and
the performance is decent. However, you are jumping back and forth between a
lot of nodes,
so it is not friendly to your CPU cache. Since nodes have vastly different
sizes it is
difficult to make a fast node allocation scheme (if you choose malloc / new,
you have lost the
speed competition).


> Thanks a lot!

You are welcome.

Mikkel



In This Thread