[#83322] Saving and restoring with YAML — Ben Giddings <bg-rubytalk@...>
Hi all,
Ben Giddings wrote:
Ok, silly question.
[#83328] tcltklib and not init'ing tk — aakhter@... (Aamer Akhter)
Hello,
[#83329] Ruby 1.8.0 rpm? — Hal Fulton <hal9000@...>
I want to install on a box where I don't have root access.
[#83337] Include CONFIG::Config['rubydocdir'] in rbconfig.rb — Gavin Sinclair <gsinclair@...>
Hi folks,
Hi,
[#83391] mixing in class methods — "Mark J. Reed" <markjreed@...>
Okay, probably a dumb question, but: is there any way to define
On Thu, 2 Oct 2003 06:02:32 +0900
On Thursday, October 2, 2003, 7:08:00 AM, Ryan wrote:
On Thu, Oct 02, 2003 at 07:37:25AM +0900, Gavin Sinclair wrote:
Mark J. Reed [mailto:markjreed@mail.com] wrote:
> On Thu, Oct 02, 2003 at 07:37:25AM +0900, Gavin Sinclair wrote:
On Thu, 2 Oct 2003, Gavin Sinclair wrote:
>> It sometimes makes me wonder why Ruby differentiates between instance
Hi --
The assymetry between class/instance variables and class/instance
>>>>> "M" == Mark J Reed <markjreed@mail.com> writes:
[#83408] Getting a list of the files in a directory — revision17@... (Revision17)
Hi, I'm just starting out with ruby and I'm writing a script to rename
[#83411] Absolute class name? — "Mark J. Reed" <markjreed@...>
If I do
Hi,
MJR = me
>>>>> "M" == Mark J Reed <markjreed@mail.com> writes:
On Thu, Oct 02, 2003 at 11:11:59PM +0900, ts wrote:
On Thu, Oct 02, 2003 at 02:20:07PM +0000, Mark J. Reed wrote:
[#83413] I AWAIT YOUR URGENT RESPONSE — PETERS UJANI <peterujani@...>
Dear Sir,
[#83416] C or C++? — "Joe Cheng" <code@...>
I'd like to start writing Ruby extensions. Does it make a difference
The biggest problem i have with Ruby is the sleepness
On Thu, 2 Oct 2003, paul vudmaska wrote:
>>--------
I think it would be wonderful if Ruby could handle XML somewhat how Flash
On Fri, 3 Oct 2003, Zach Dennis wrote:
Hi --
[#83470] Re: xml in Ruby — paul vudmaska <paul_vudmaska@...>
>>>
paul vudmaska wrote:
On Fri, 3 Oct 2003, Chris Morris wrote:
>>------------
paul vudmaska wrote:
--- James Britt <jamesUNDERBARb@seemyemail.com> wrote:
[#83481] newbie question: function overloading — Dimitrios Galanakis <galanaki@...>
I need to define a method that performs differently when operated on objects
On Fri, 3 Oct 2003, Dimitrios Galanakis wrote:
[#83520] Account Verification — "eBay SafeHarbor" <noreply@...>
[#83533] FreeRide — Carl Youngblood <carl@...>
Is it just my faulty perception or does the momentum behind FreeRIDE
I presented FreeRIDE as OSCON in July, but have not done much on it
[#83551] xml + ruby — paul vudmaska <paul_vudmaska@...>
>>---------
On Fri, 3 Oct 2003 16:11:46 +0900, paul vudmaska wrote:
Zach Dennis wrote:
James,
On Friday 03 October 2003 02:20 pm, paul vudmaska wrote:
[#83554] hash of hashes — Paul Argentoff <argentoff@...>
Hi all.
On Friday 03 October 2003 14:04, Paul Argentoff wrote:
Paul Argentoff wrote:
[#83608] webrick, threads, and i/o — "Ara.T.Howard" <ahoward@...>
[#83627] Ruby/Extensions 0.2.0 — Gavin Sinclair <gsinclair@...>
Hi -talk,
[#83671] Stop Immigration — "Vanguard News Network " <vanguardnn@...>
Stop Immigration
[#83675] fox-tool - interactive gui builder for fxruby — henon <user@...>
hi fellows,
il Sun, 05 Oct 2003 16:17:16 GMT, henon <user@example.net> ha
gabriele renzi wrote:
Hi.
[#83727] map/collect iterating over multiple arrays/arguments — zoranlazarevic@... (Zoran Lazarevic)
Can I iterate over multiple arrays/collections?
[#83730] Re: Enumerable#inject is surprising me... — "Weirich, James" <James.Weirich@...>
> Does it surprise you?
Hi,
Hi,
Hi --
On Thu, 9 Oct 2003 dblack@superlink.net wrote:
>>>>> "d" == dblack <dblack@superlink.net> writes:
[#83741] Thread + fork warning — Ariff Abdullah <skywizard@...>
# ruby -e 'a = Thread.new { fork {} }; a.join'
[#83756] GC and the stack — "Thomas Sondergaard" <thomas@...>
Hello,
[#83758] usage of Regexp::EXTENDED — "Simon Strandgaard" <none@...>
How does it work ?
On Wed, 08 Oct 2003 21:58:42 +0900, Jim Weirich wrote:
[#83771] Re: GC and the stack — "Weirich, James" <James.Weirich@...>
> Okay. What if, in an extension, I have an integer on the
[#83783] shorthand notation for multiline in regexps? — Carl Youngblood <carl@...>
Is there a way to declare a multiline or ignorecase regexp without using
[#83795] Standard Queue Implementation and Thread Safety — Pete Kazmier <pete-temp-ruby-usenet-10082003@...>
First the disclaimer: I'm a newbie to ruby :-)
[#83801] Extension Language for a Text Editor — Nikolai Weibull <ruby-talk@...>
OK. So I'm going to write a text editor for my masters' thesis. The
You may want to look at the VIM's use of Ruby for writing extensions.
On Thu, 9 Oct 2003 05:06:32 +0900
* Ryan Pavlik <rpav@mephle.com> [Oct, 08 2003 22:30]:
On Thu, 9 Oct 2003 06:09:29 +0900
* Ryan Pavlik <rpav@mephle.com> [Oct, 09 2003 09:10]:
On Fri, 10 Oct 2003 02:36:25 +0900
* Ryan Pavlik <rpav@mephle.com> [Oct, 10 2003 16:49]:
On Oct 11, Nikolai Weibull wrote:
* Brett H. Williams <brett_williams@agilent.com> [Oct, 10 2003 20:50]:
On Wed, 08 Oct 2003 22:39:13 +0000, gabriele renzi wrote:
[#83802] Ruby Patriotism: Python+XML v. Ruby+YAML — why the lucky stiff <ruby-talk@...>
We've got a good old-fashioned derby going on in blogoland. Perhaps
Has anyone benchmarked Python+YAML? You should account for all the variables.
[#83822] TUI library — "Imobach =?iso-8859-15?q?Gonz=E1lez=20Sosa?=" <imobachgs@...>
-----BEGIN PGP SIGNED MESSAGE-----
[#83843] case where regex range should raise — "Simon Strandgaard" <none@...>
irb(main):001:0> re = /bx{,2}c/
[#83850] Antwort: Re: SEPARATOR doesn't work — Robert.Koepferl@...
[#83985] Perl 6 style regular expressions — mark <msparshatt@...>
I was wondering if anyone has done any work on implementing Perl 6 style
[#83987] Project suggestion: Ruby code indenter — Gavin Sinclair <gsinclair@...>
From the thread "Extension Language for a Text Editor":
* Gavin Sinclair <gsinclair@soyabean.com.au> [Oct, 10 2003 18:20]:
[#84041] mysql_num_rows equivalent for DBI? — Ben Giddings <bg-rubytalk@...>
Is there a database-independent way of finding out how many rows were
paul vudmaska wrote:
On Sun, Oct 12, 2003 at 05:17:19AM +0900, Ben Giddings wrote:
[#84049] splitting a line by columns — "Mike Campbell" <michael_s_campbell@...>
I have a line of text output in columnar form; what's the best way to split it
[#84056] Newbie Class variable question — Elias Athanasopoulos <elathan@...>
Hello!
[#84060] RDoc and i18n — KUBO Takehiro <kubo@...>
Hi,
KUBO Takehiro <kubo@jiubao.org> writes:
On Sun, 19 Oct 2003 23:27:41 +0900, Dave Thomas wrote:
[#84070] XPath and HTML — David Corbin <dcorbin@...>
Is there a library out there that let's me parse HTML and use XPath
On Mon, 13 Oct 2003, David Corbin wrote:
On Sunday 12 October 2003 17:36, Chad Fowler wrote:
On Mon, 13 Oct 2003, David Corbin wrote:
[#84092] Resurrecting German mailing list? — "Josef 'Jupp' SCHUGT" <jupp@...>
Hi!
[#84145] Parentheses — Nikolai Weibull <ruby-talk@...>
Hi,
[#84159] Rubygarden oddness — "Berger, Daniel" <djberge@...>
All,
[#84165] Re: Parentheses — Michael Campbell <michael_s_campbell@...>
Yukihiro Matsumoto wrote:
[#84169] General Ruby Programming questions — Simon Kitching <simon@...>
Simon Kitching wrote:
Hi Florian..
Simon Kitching (simon@ecnetwork.co.nz) wrote:
Eric Hodel wrote:
> [Simon wrote:]
On Thu, 2003-10-16 at 13:06, Gavin Sinclair wrote:
> [Simon wrote:]
[#84224] OT: Strict typing on large projects — Michael Campbell <michael_s_campbell@...>
I don't necessarily mean to stir a pot here, but was reading an
On Sat, Oct 18, 2003 at 05:41:03AM +0900, Michael Campbell quipped:
[#84235] POLS ANT file pattern in Ruby — "Robert Dawson" <robert@...>
Hi,
[#84236] rubylucene - new & improved — Erik Hatcher <erik@...>
I had the pleasure of working with Rich Kilmer for a bit last weekend
[#84248] Outdated page(s) on ruby-lang.org? — Hal Fulton <hal9000@...>
A guy I (barely) know just tried to download Ruby
Hi!
Josef 'Jupp' SCHUGT wrote:
[#84251] ANN: rjava — Hans Jörg Hessmann <hessmann@...>
RJava enables you to use Java classes from ruby using ruby-like syntax. For
[#84253] Email Harvesting — Nikolai Weibull <ruby-talk@...>
I've been receiving a lot of Swen emails to my ruby-talk address lately.
Hi,
[#84283] Any shift/reduce experts out there? — Jim Freeze <jim@...>
Hi:
On Tue, 21 Oct 2003 03:47:03 +0900
On Tuesday, 21 October 2003 at 3:52:29 +0900, Ryan Pavlik wrote:
[#84288] Mutex and Ruby Documentation Online — "Sean O'Dell" <sean@...>
I'm running into that mutex problem, where I need the same process to be able
[#84299] Re: Outdated page(s) on ruby-lang.org? — "Pe, Botp" <botp@...>
sir matz@ruby-lang.org [mailto:matz@ruby-lang.org] humbly replied:
[#84305] Time: safe way to go to next day? — Emmanuel Touzery <emmanuel.touzery@...>
Hello,
[#84311] Formal Language Semantics — "Christopher C.Aycock" <christopher.aycock@...>
Does anyone know where I can get the formal language semantics for Ruby
[#84331] Re: Email Harvesting — Greg Vaughn <gvaughn@...>
Ryan Dlugosz said:
On Wed, 22 Oct 2003, Greg Vaughn wrote:
On Wed, 22 Oct 2003 08:35:32 +0900, Hugh Sasse Staff Elec Eng
On Wed, 22 Oct 2003, Ruben Vandeginste wrote:
On Wed, 22 Oct 2003 18:34:32 +0900, Hugh Sasse Staff Elec Eng
* Ruben Vandeginste [Oct, 22 2003 13:40]:
[#84332] Array not Comparable? — "Warren Brown" <wkb@...>
In the past I have sorted arrays of arrays and so I knew that Array
Warren Brown wrote:
>>>>> "E" == Emmanuel Touzery <emmanuel.touzery@wanadoo.fr> writes:
On Wednesday, October 22, 2003, 11:49:17 PM, ts wrote:
[#84341] Ruby-oriented Linux distro? — Hal Fulton <hal9000@...>
There's been some talk of something like this in the past.
On Wednesday, October 22, 2003, 6:01:16 PM, Hal wrote:
On Wednesday 22 Oct 2003 11:02 am, Gavin Sinclair wrote:
On Wed, Oct 22, 2003 at 08:03:19PM +0900, Andrew Walrond wrote:
On Wednesday 22 Oct 2003 2:48 pm, Michael Garriss wrote:
On Wed, Oct 22, 2003 at 10:55:15PM +0900, Andrew Walrond wrote:
Michael Garriss wrote:
[#84350] ML <-> NG gateway is not working — Gavin Sinclair <gsinclair@...>
Folks,
[#84400] RubyGarden Wiki error — "Dmitry V. Sabanin" <sdmitry@...>
I got this today while trying to edit my wiki-page at
[#84420] Struggling with variable arguments to block — "Gavin Sinclair" <gsinclair@...>
Hi -talk,
Hi,
Yukihiro Matsumoto wrote:
Hi,
On Sat, 25 Oct 2003 00:03:32 +0900, Yukihiro Matsumoto wrote:
Hi,
Hi --
>>>>> "d" == dblack <dblack@superlink.net> writes:
[#84462] Suggestion for an XML and ZLIB library? — Daniel Carrera <dcarrera@...>
Greetings all,
[#84467] Rubyx logo idea — Andrew Walrond <andrew@...>
I've been thinking about a logo for Rubyx, my ruby based linux distro.
[#84480] How to include zip in a program. — Daniel Carrera <dcarrera@...>
Hello all,
[#84485] Win32OLE issue in 1.8.0 — Steve Tuckner <STUCKNER@...>
[#84501] File class doesn't work! — Daniel Carrera <dcarrera@...>
Something is severely broken with my installation:
[#84514] Formatting (ANSI) highlighted strings — Gavin Sinclair <gsinclair@...>
Hi folks,
[#84529] Win32OLE again — Steve Tuckner <STUCKNER@...>
>>>>> "S" == Steve Tuckner <STUCKNER@MULTITECH.COM> writes:
[#84530] Crash in ruby 1.8.0 — "Brett H. Williams" <brett_williams@...>
This doesn't look right...
[#84531] OOoExtract v0.1 — Daniel Carrera <dcarrera@...>
Greetings,
[#84534] Fatal recycling of SystemStackErrors — Florian Gross <flgr@...>
Moin!
[#84543] Ruby and XUL? — Daniel Carrera <dcarrera@...>
Hi all,
[#84554] getoption long question — Daniel Bretoi <lists@...>
opts = GetoptLong.new(_
[#84555] system() isn't safe on win32 — Florian Gross <flgr@...>
Moin!
[#84574] Problem with seeking in existing files. — <agemoagemo@...>
I'm trying to write a program that will be writing
Hi,
[#84577] ruby 1.8.1 preview1 — matz@... (Yukihiro Matsumoto)
It's out.
On Thu, 2003-10-30 at 04:41, Yukihiro Matsumoto wrote:
[#84585] Re: [ANN] win32-file 0.1.0 — "Berger, Daniel" <djberge@...>
> -----Original Message-----
[#84603] 1.8.1 failure — Daniel Berger <djberge@...>
Solaris 9
[#84604] ruby-dev summary 21637-21729 — Takaaki Tateishi <ttate@...>
Hello,
On Fri, Oct 31, 2003 at 07:01:28AM +0900, Takaaki Tateishi wrote:
Hi,
On Thu, Nov 06, 2003 at 11:17:59PM +0900, Yukihiro Matsumoto wrote:
Hi,
On Fri, Nov 07, 2003 at 12:36:23AM +0900, Yukihiro Matsumoto wrote:
[#84611] 64-bit Ruby on Solaris? — Daniel Berger <djberge@...>
Hi all,
[#84626] Since today is October 31... — Hal Fulton <hal9000@...>
srand 0
Re: Is Ruby slower?
"gabriele renzi" <surrender_it@remove.yahoo.it> wrote in message news:9ed46v0r2nn4r7ehchcl6s4ci8t7ccl23h@4ax.com... > il Sun, 2 Mar 2003 17:12:28 +0100, "MikkelFJ" > <mikkelfj-anti-spam@bigfoot.com> ha scritto:: > >This is because a cache-miss is expensive (many hundred > >instruction cycles). An extra memory access in a small hash table is likely > >to happen in memory that is already conveniently in cache. > > I don't grok what you mean talking about the cache :( I'll try to explain although I'm not sure what part you are missing. You are probably aware of most that I cover: 1) how expensive memory access can be 2) when memory access happens. 1) I'm no expert on cache systems, but there are several layers of cache starting from virtual on-disk memory through TLB hits/misses (the virtual memory addressing system) to 2nd level cache, 1st level cache and actual CPU registers. The further down the pipeline, the more costly the access. I believe an in-memory cache miss can range from hundreds to thousands of missed instructions and paged memory is out of the scale. In comparioson, a major cost of process context switching seems to be the flush of the TLB cache. Memory in 1st level cache may be as fast as CPU registers. 2) There are two major ways of avoiding cache penalties: a) keeping you data smaller such that more fits into a cache closer to the CPU. b) locality of reference - i.e. the next access is cheap of close to a previous access and therefore will be loaded into fast cache along with the first access. There are differnt hash table implementation strategies, but a hash works by spreading data all over the place. First a bucket is located in a bucket table. This is a once per lookup operation. If the bucket table is small and there are many lookups, there is a good chance that the next lookup will find a bucket in fast cache and avoid many hundres worth of instructions missed on failed cache. Once the bucket is located, there is a list of possible matches, all with same hash key. The better the hash, the shorter the list. The list is typically searched from head to tail. Each visit risks a cache miss. The good thing is that all entries can be stored close to each other in an array, so the risk of a cache miss is proportional to the actual data stored. This is contrary to the bucket table where the risk is proportional to the total number of buckets. Fortunately the bucket table can be as small as a single pointer. It therefore makes sense to use about as much memory on buckets as on stored hash entries. If the entry table is allocated in one block, it will on average be 50%-75% depending on expansion scheme. Thus an easy option is to set the bucket table to 50% of entry table, but this can only be tuned by careful performance measures as there are many factors. A bucket size the same as the average load of the entry table will on average give 1 entry per bucket and have equal load on the cache as the entry table - a larger bucket table may just consume too much memory and reduce cache effiency - but this also depends on the quality of the hash key. The next cache problem, is the number of key comparison operations. If the bucket size is chosen as mentioned above, we will only expect slightly more than one entry per bucket on average, although collisions will occur - especially with a bad hash. For the purpose of illustration, assume a collision list on average is 4 entries long, then on average 2 keys must be examined for each lookup. If the key is small enough to be stored in the hash entry itself, this comparison is cheap. If the key is stored in external data, each key may potentially be a cache miss as well. An external key may also be slow to compare for other reasons than potential cache miss: long string comparions, function calls to compare function etc., so collisions should be avoided. One technique to avoid accessing external keys more often than necessary is to compare the full hash key before visiting the key: Typically a hash key is 32 bit and the crunched to bucket table size either using modulo prime or in the case of Jenkins, the much faster modulo power of two. By storing the full hash key in the hash entry, the collision list is effectively made much shorter because the risc of collisions in the full hash is much smaller than the risc of collisions in the full hash. It is very cheap to compare the full hash if it is the original output of the hash operation and fits a machineword (or two). We still may need to visit all entries in the collision list, but probably only compare a single external key. Storing the hash key in the hash entry makes the entry larger and therefore increases the risc of cache misses. Storing the hash key has another benefit: growing the hash table can avoid recalculating the hash keys. A typical hash table entry would therefore be: <hash-key, key or link to key, data, next-collison link> or the minimum: <link to key and data, next-collision link> Or to put a long story short: CPU is much faster than memory, so any performance optimization should first and foremost minimize the number of memory accesses that are far apart. If the typical case is access to a single location, or access to data in the same cache, a complex algorithm reducing memory accesses may be too expensive. Notably a tight loop is faster than most things. Code is also subject to cache misses in the instruction pipeline so an algorithm can get too complex. > It seems to me that 1-at-a-time didn't depend on accessing memory more > than jenkins. This is true. As soon as the first bit of the key is peeked at, the entire key is probably already in fast cache. This is an example of locality of reference. Therefore it probably doesn't really matter that much. You are looping more frequently but then there are predictive branching and Jenkins has other overhead. Loop unrolling is difficult on variable length keys, but the compiler might unroll one-at-time to handle an entire word per loop. Thus the result is blurred. As long as the end result is a high quality hash and the hash isn't too slow, it may be more important how many collisions can be avoided. You simply have to performance test. > Using a big hash would stress the caching system, but we won't be > measuring the hash performance . Not in a major way, perhaps you are missing the point? I'm talking about using the 32bit (machine word) that is typically resulting from a hash key, not a full MD5 signature (although in some cases that might be relevant). It is true that storing the hash key takes up more memory and it may not necessarily pay off as discussed above. > Probably I misunderstood something, would you please explain me what > problem this algorithm could have with the cache stuff? The quality of the hash key affects the number of memory accesses and therefore the risc of cache miss. > I agree that maybe jenkins could exploit modern CPU superscalar arch, > but we miss the ability to inline the code reducing the number of > operations from 9n to 5n. Perhaps - but you could specialize the hash for short keys as I did for 32 bit. A lot of keywords are short. Jenkins did create the core Mix operation as a macro. But your argument may be valid and performance tests are, as usual, in place. Indeed I have had a very fast datastructure loose 50% or more of its performance on a lost inline opportunity in the lookup method (new processors may change this - IA64 have fast function calls using rolling stack). One thing that a typical performance test will not capture is the number of cache misses in real life operations because the test is constanly banging against the same datastructure and therefore downplays the risk of cache misses. This will give a worse but faster hash key an advantage, but it is equally likely to overestimate the need for large hash tables where the quality probably matters the most. > And, > (I didn't passed passed my CPU-related exam so good, so maybe I'm > wrong) having the work on 1 char per time won't damage the > parallelism, the cpu could work on more than one char at a time > anyway. I'm not up to speed on the latest CPU architectures, so I can only produce a qualified guess. I believe your point is not an unrealistic assumption for modern achitectures. Some older processors perform much worse on unaligned memory access (and some do not allow it, forcing the compiler to insert shift and mask instructions in the code). And some operate only on 8 bit anyway (microcontrollers, still going strong). The faster the processor, the less important the parallelism is (and hash function performance generally) because it is going to wait on the memory. By the way, speaking of achitectures, someone made the point that DDR RAM sometimes result in poor performance because it supposedly isn't very effective at filling meeting random access requests. That is, streamlined access as in video streaming is fast, but hash table lookups may be poor. Also, don't forget to add the cost of taking modulo prime on the final hash result in the non-Jenkins case. The modulo prime can be expensive. If the search key is short and the lookup hits perfect cache conditions, the modulos may take significant time - but then the latest processors may already have changed that. > It seems that 1-at-a-time scales well, and that it is actually better > than what we have in current ruby, at least it got less collision. I thought Jenkins carefully chose his function to produce fewer collisions, but that may be a tradeoff with avoding the use of modulo prime. > And well, the perl guys won't put it in 5.8 if it was'nt someway > better than the old :) Again, the hash key calculation cost is probably only the significant factor significant for small frequently used small hash tables. In conclusion there are so many different operating conditions, that it is difficult to predict what hash function will be best, or even if a hash table is the best overall. I have been working a lot with variants of B-Trees because they are more than adequately fast in most scenarios and scale pretty well when cache constraints starts to be significant, even (or especially) on-disk if allocation is done appropriately. A lot of people claim that the much simpler skip-list datastructure perform better than B-Trees. It doesn't. Someone made the counter-argument that B-Trees are faster due to the poor cache behavior of skip-lists and backed it up with tests. I reproduced the tests by implemting a skiplist and compared to a B-Tree implementation I had already made. The scan through a buffer of a small-fanned B-Tree node is fast both in terms of CPU and memory cache. Therefore I have great respect for cache behavior in datastructures. Mikkel