From: Rodrigo Rosenfeld Rosas Date: 2013-02-07T02:06:52+09:00 Subject: [ruby-core:51936] Re: [ruby-trunk - Feature #7792] Make symbols and strings the same thing Em 06-02-2013 13:25, Yorick Peterse escreveu: > I don't think I'm following you, can you explain what's supposedly > ironic about it? Using Hashie only "slows" things down based on whether > you use Symbols, Strings or object attributes. Unless you use it *all* > over the place the performance impact is small. What I'm trying to say is that the main reason why symbols exist in Ruby in the first place is performance from what I've been told. But then people don't want to worry if hashes are indexed by strings or symbols so they end up using some kind of HashWithIndifferentAccess or similar techniques. But since the normal Hash class doesn't behave this way you have to loop through all hashes in an object returned by JSON.parse to make them behave as HashWithIndifferentAccess, which is has a huge performance hit when compared to the small gains symbols could add. > I personally don't fully agree with what Hashie does because I believe > people should be competent enough to realize that when they take in > external data it's going to be String instances (for keys that is). It is not a matter of being competent or not. You can't know in advance if a hash returned by some external code is indexed by string or symbols. You have to test by yourself or check the documentation. Or you could just use a HashWithIndifferentAccess class and stop worrying about it. This has a big impact on coding speed and software maintenance, which is the big problem in my opinion. > Having said that, I think fundamentally changing the way Ruby works when > it comes to handling Strings and Symbols because developers can't be > bothered fixing the root cause of the problem is flawed. People reading some Ruby book will notice that it is not particularly designed with performance in mind but it is designed mostly towards programmer's happiness. If that is the case, then worrying about bothered programmers makes sense to a language like Ruby in my opinion. > If you're worried about a ddos DDoS is a separate beast that can't be easily prevented no matter what language/framework you use. I'm just talking about DoS exploiting through memory exhaustion due to symbols not being collected. Anyway, this is a separate issue from this one and would be better discussed in that separate thread. > stop converting everything to Symbols. I'm not converting anything to symbols. Did you read the feature description use case? I'm just creating regular hashes using the new sexy hash syntax which happens to create symbols instead of strings. Then when I serialize my object to JSON for storing on Redis for caching purpose I'll get a hash indexed by strings instead of symbols. That means that when I'm accessing my hash I have to be worried if the hash has been just generated or if it was loaded from Redis to decide if I should use strings or symbols to get the hash values. There are many more similar situations where this difference between symbols and strings will cause confusion. And I don't see much benefits in keeping them separate things either. > If you're worried about not remember what key type to use, use a > custom object or > document it so that people can easily know. This isn't possible when you're serializing/deserializing using some library like JSON or any other. You don't control how hashes are created by such libraries. > While Ruby is all about making the lifes easier I really don't want it > to become a language that spoon feeds programmers because they're too > lazy to type 1 extra character *or* convert the output manually. Or > better: use a custom object as mention above. Again, see the ticket description first before assuming things. > The benchmark you posted is flawed because it does much, much more than > benchmarking the time required to create a new Symbol or String > instance. Lets take a look at the most basic benchmark of these two data > types: > > require 'benchmark' > > amount = 50000000 > > Benchmark.bmbm(40) do |run| > run.report 'Symbols' do > amount.times do > :foobar > end > end > > run.report 'Strings' do > amount.times do > 'foobar' > end > end > end > > On the laptop I'm currently using this results in the following output: > > > Rehearsal > ---------------------------------------------------------------------------- > Symbols 2.310000 0.000000 > 2.310000 ( 2.311325) > Strings 5.710000 0.000000 > 5.710000 ( 5.725365) > > ------------------------------------------------------------------- > total: 8.020000sec > > user system > total real > Symbols 2.670000 0.000000 > 2.670000 ( 2.680489) > Strings 6.560000 0.010000 > 6.570000 ( 6.584651) > > This shows that the use of Strings is roughly 2,5 times slower than > Symbols. Now execution time isn't the biggest concern in this case, it's > memory usage. Exactly, no real-world software would consist mostly of creating strings/symbols. Even in a simplistic context like my example, it is hard to notice any impact on the overall code caused by string allocation taking more time than symbols.When we get more complete code we'll notice that it really doesn't make any difference if we're using symbols or strings all over our code... Also, any improvements on threading and parallelizing support are likely to yield much bigger performance boots than any micro-optimization with symbols instead of strings. > For this I used the following basic benchmark: > > def get_memory > return `ps -o rss= #{Process.pid}`.strip.to_f > end > > def benchmark_memory > before = get_memory > > yield > > return get_memory - before > end > > amount = 50000000 > > puts "Start memory: #{get_memory} KB" > > symbols = benchmark_memory do > amount.times do > :foobar > end > end > > strings = benchmark_memory do > amount.times do > 'foobar' > end > end > > puts "Symbols used #{symbols} KB" > puts "Strings used #{strings} KB" > > This results in the following: > > Start memory: 4876.0 KB > Symbols used 0.0 KB > Strings used 112.0 KB > > Now I wouldn't be too surprised if there's some optimization going on > because I'm re-creating the same values over and over again but it > already shows a big difference between the two. 112KB isn't certainly a big difference in my opinion unless you're designing some embedded application. I've worked with embedded devices in the past and although I see some attempts to make a lighter Ruby subset (like mRuby) for such use-case I'd certainly use C or C++ for my embedded apps these days. Did you know that Java initially was supposed to be used by embedded devices from what I've been told? Then it tried to convince people to use it to create multi-platform desktop apps. After that its initial footprint was so big that it wasn't a good idea to try it on embedded devices for most cases. Then they tried to make it work in browsers through applets. Now it seems people want to use Java mostly for web servers (HTTP and other protocols). The result was a big mess in my opinion. I don't think Ruby (the full specification) should be concerned about embedded devices. C is already a good fit for devices with small memory constraints. When you consider using Ruby it is likely that you have more CPU and memory resources than a typical small device would have, so 112KB wouldn't make much difference. And for embedded devices, it is also recommended that they run some RTOS instead of plain Linux. If they want to keep with Linux, an option would be to patch it with Xenomai patch for instance. But in that case, any real-time task would be implemented in C, not in Ruby or any other language subjected to garbage collected, like Java. So, if we keep the focus on applications running on normal computers, 112KB won't really make any difference, don't you agree? > To cut a long story short: I can understand what you're trying to get > at, both with the two data types being merged and the ddos issue. > However, I feel neither of these issues are an issue directly related to > Ruby itself. If Ruby were to automatically convert things to Symbols for > you then yes, but in this case frameworks such as Rails are the cause of > the problem. Rails is not related at all to the use case I pointed out in this ticket description. It happens with regular Ruby classes (JSON, Hash) and with the "redis" gem that is independent from Rails. > Merging the two datatypes would most likely make such a > huge different usage/code wise that it would probably be something for > Ruby 5.0 (in other words, not in the near future). Ruby 3.0 won't happen in a near future. Next Major means Ruby 3.0 if I understand it correctly.