From: Rodrigo Rosenfeld Rosas <rr.rosas@...>
Date: 2013-02-07T02:06:52+09:00
Subject: [ruby-core:51936] Re: [ruby-trunk - Feature #7792] Make symbols and strings the same thing

Em 06-02-2013 13:25, Yorick Peterse escreveu:
> I don't think I'm following you, can you explain what's supposedly
> ironic about it? Using Hashie only "slows" things down based on whether
> you use Symbols, Strings or object attributes. Unless you use it *all*
> over the place the performance impact is small.

What I'm trying to say is that the main reason why symbols exist in Ruby 
in the first place is performance from what I've been told.

But then people don't want to worry if hashes are indexed by strings or 
symbols so they end up using some kind of HashWithIndifferentAccess or 
similar techniques. But since the normal Hash class doesn't behave this 
way you have to loop through all hashes in an object returned by 
JSON.parse to make them behave as HashWithIndifferentAccess, which is 
has a huge performance hit when compared to the small gains symbols 
could add.

> I personally don't fully agree with what Hashie does because I believe
> people should be competent enough to realize that when they take in
> external data it's going to be String instances (for keys that is).

It is not a matter of being competent or not. You can't know in advance 
if a hash returned by some external code is indexed by string or 
symbols. You have to test by yourself or check the documentation. Or you 
could just use a HashWithIndifferentAccess class and stop worrying about 
it. This has a big impact on coding speed and software maintenance, 
which is the big problem in my opinion.

> Having said that, I think fundamentally changing the way Ruby works when
> it comes to handling Strings and Symbols because developers can't be
> bothered fixing the root cause of the problem is flawed.

People reading some Ruby book will notice that it is not particularly 
designed with performance in mind but it is designed mostly towards 
programmer's happiness. If that is the case, then worrying about 
bothered programmers makes sense to a language like Ruby in my opinion.

> If you're worried about a ddos

DDoS is a separate beast that can't be easily prevented no matter what 
language/framework you use. I'm just talking about DoS exploiting 
through memory exhaustion due to symbols not being collected. Anyway, 
this is a separate issue from this one and would be better discussed in 
that separate thread.

> stop converting everything to Symbols.

I'm not converting anything to symbols. Did you read the feature 
description use case? I'm just creating regular hashes using the new 
sexy hash syntax which happens to create symbols instead of strings. 
Then when I serialize my object to JSON for storing on Redis for caching 
purpose I'll get a hash indexed by strings instead of symbols. That 
means that when I'm accessing my hash I have to be worried if the hash 
has been just generated or if it was loaded from Redis to decide if I 
should use strings or symbols to get the hash values. There are many 
more similar situations where this difference between symbols and 
strings will cause confusion. And I don't see much benefits in keeping 
them separate things either.

> If you're worried about not remember what key type to use, use a 
> custom object or
> document it so that people can easily know.

This isn't possible when you're serializing/deserializing using some 
library like JSON or any other. You don't control how hashes are created 
by such libraries.

> While Ruby is all about making the lifes easier I really don't want it
> to become a language that spoon feeds programmers because they're too
> lazy to type 1 extra character *or* convert the output manually. Or
> better: use a custom object as mention above.

Again, see the ticket description first before assuming things.

> The benchmark you posted is flawed because it does much, much more than
> benchmarking the time required to create a new Symbol or String
> instance. Lets take a look at the most basic benchmark of these two data
> types:
>
>     require 'benchmark'
>
>     amount = 50000000
>
>     Benchmark.bmbm(40) do |run|
>       run.report 'Symbols' do
>         amount.times do
>           :foobar
>         end
>       end
>
>       run.report 'Strings' do
>         amount.times do
>           'foobar'
>         end
>       end
>     end
>
> On the laptop I'm currently using this results in the following output:
>
>
>     Rehearsal 
> ----------------------------------------------------------------------------
>     Symbols                                    2.310000   0.000000 
> 2.310000 (  2.311325)
>     Strings                                    5.710000   0.000000 
> 5.710000 (  5.725365)
>     
> ------------------------------------------------------------------- 
> total: 8.020000sec
>
>                                                    user     system 
>  total        real
>     Symbols                                    2.670000   0.000000 
> 2.670000 (  2.680489)
>     Strings                                    6.560000   0.010000 
> 6.570000 (  6.584651)
>
> This shows that the use of Strings is roughly 2,5 times slower than
> Symbols. Now execution time isn't the biggest concern in this case, it's
> memory usage.

Exactly, no real-world software would consist mostly of creating 
strings/symbols. Even in a simplistic context like my example, it is 
hard to notice any impact on the overall code caused by string 
allocation taking more time than symbols.When we get more complete code 
we'll notice that it really doesn't make any difference if we're using 
symbols or strings all over our code...

Also, any improvements on threading and parallelizing support are likely 
to yield much bigger performance boots than any micro-optimization with 
symbols instead of strings.

> For this I used the following basic benchmark:
>
>     def get_memory
>       return `ps -o rss= #{Process.pid}`.strip.to_f
>     end
>
>     def benchmark_memory
>       before = get_memory
>
>       yield
>
>       return get_memory - before
>     end
>
>     amount = 50000000
>
>     puts "Start memory: #{get_memory} KB"
>
>     symbols = benchmark_memory do
>       amount.times do
>         :foobar
>       end
>     end
>
>     strings = benchmark_memory do
>       amount.times do
>         'foobar'
>       end
>     end
>
>     puts "Symbols used #{symbols} KB"
>     puts "Strings used #{strings} KB"
>
> This results in the following:
>
>     Start memory: 4876.0 KB
>     Symbols used 0.0 KB
>     Strings used 112.0 KB
>
> Now I wouldn't be too surprised if there's some optimization going on
> because I'm re-creating the same values over and over again but it
> already shows a big difference between the two.

112KB isn't certainly a big difference in my opinion unless you're 
designing some embedded application. I've worked with embedded devices 
in the past and although I see some attempts to make a lighter Ruby 
subset (like mRuby) for such use-case I'd certainly use C or C++ for my 
embedded apps these days. Did you know that Java initially was supposed 
to be used by embedded devices from what I've been told? Then it tried 
to convince people to use it to create multi-platform desktop apps. 
After that its initial footprint was so big that it wasn't a good idea 
to try it on embedded devices for most cases. Then they tried to make it 
work in browsers through applets. Now it seems people want to use Java 
mostly for web servers (HTTP and other protocols). The result was a big 
mess in my opinion. I don't think Ruby (the full specification) should 
be concerned about embedded devices. C is already a good fit for devices 
with small memory constraints. When you consider using Ruby it is likely 
that you have more CPU and memory resources than a typical small device 
would have, so 112KB wouldn't make much difference.

And for embedded devices, it is also recommended that they run some RTOS 
instead of plain Linux. If they want to keep with Linux, an option would 
be to patch it with Xenomai patch for instance. But in that case, any 
real-time task would be implemented in C, not in Ruby or any other 
language subjected to garbage collected, like Java. So, if we keep the 
focus on applications running on normal computers, 112KB won't really 
make any difference, don't you agree?

> To cut a long story short: I can understand what you're trying to get
> at, both with the two data types being merged and the ddos issue.
> However, I feel neither of these issues are an issue directly related to
> Ruby itself. If Ruby were to automatically convert things to Symbols for
> you then yes, but in this case frameworks such as Rails are the cause of
> the problem.

Rails is not related at all to the use case I pointed out in this ticket 
description. It happens with regular Ruby classes (JSON, Hash) and with 
the "redis" gem that is independent from Rails.

> Merging the two datatypes would most likely make such a
> huge different usage/code wise that it would probably be something for
> Ruby 5.0 (in other words, not in the near future).

Ruby 3.0 won't happen in a near future. Next Major means Ruby 3.0 if I 
understand it correctly.