From: Cezary Date: 2011-05-31T23:06:33+09:00 Subject: [ruby-core:36631] Re: [Ruby 1.9 - Feature #4801][Open] Shorthand Hash Syntax for Strings On Tue, May 31, 2011 at 05:55:39AM +0900, Piotr Szotkowski wrote: > Cezary: First of all, thanks Piotr for taking the time to discuss this. My original ideas for solving the problem or their descriptions sucked, but I left your comments because they still apply or provide good examples. I'm trying to get an idea of how the implementation decisions behind hashes affect the general use of hashes in Ruby and if something could be slightly changed in favor improving the user's experience with the language without too much sacrifice in other areas. I believe Hash was designed with efficiency and speed in mind and the recent Hash syntax changes suggest that all the current ways people use Hash in Ruby is way beyond scope of the original concept. Refinements may minimize the need for changes here, but even still, I think this is a good time to consider what Hash is used for and how syntax changes can help users better express their ideas instead of just being able to choose only between an array, a very, very general associative array or 3rd party gems that have no syntax support. I hope I am not going overboard with this topic. I have serious doubts that the slight changes in Hash behavior presented won't cause problems, but I cannot think of any serious downsides, especially if only a warning is emitted. And with such a usability upside, I must be missing a big flaw in the idea or a big gain from the current behavior. If this topic does not contribute to Ruby from the user's perspective I am ready to drop the subject entirely. > > I though exactly the same thing, until I realized > > that having keys of different types in a Hash > > isn't really part of the general Hash concept. > > Why? [citation needed] My wording isn't correct. First, a Hash in ruby is an associative array that I read about here: http://en.wikipedia.org/wiki/Associative_array And from this: "From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array. While a regular array maps an integer key (index) to a value of arbitrary data type, an associative array's keys can also be arbitrarily typed. In some programming languages, such as Python, the keys of an associative array do not even need to be of the same type." The type of the key can be anything. Keys can even be different types with a single instance. The latter is not a requirement of every possible associative array implementation and this is what I meant. It can be implementation specific, for example - an rbtree requires ordering of keys. In this specific case, you cannot have a symbol and string in such an associative array, because you cannot compare them. But since Hash uses a hash table, it is possible to have a wider range of key types, including both symbol and string together. The implementation allows it, but my question is: is it *that* useful in the real world? Or does it cause more harm than good? > > { nil => 0, :foo => 1, 'foo' => 2 } > > > Conceptually, people expect Hash keys to be of the same type, > > except maybe for "hacks" like that nil above that can simplify code. > > Well, they either do or don���t, then. :) Right. What I wrote isn't correct. I think people expect hash keys to match a given domain to consider them valid. Just like every variable should have a value within bounds or raise at the first possible opportunity. Unless the cause of a problem is otherwise trivial to find and fix. I don't recommend the example with nil above. Better alternatives IMHO: { :'' => 0, :foo => 1 }[ some_key || :'' ] or { :foo => 1 }[some_key] || 0 or set the default in Hash Hash.new(0).merge( :foo => 1 )[some_key] That is why I called it a hack - using a Hash key to get default values. > Hm, IMHO ���any object can be a key, just as any object can be > a value��� is the general case, and ���I want my Strings and Symbols > to be treated the same when they���re similar, oh, and maybe with > the nil handled separately for convenience��� is the specialised case. Exactly. The specialized case is obviously bad. But the general case turned out not to be too great. I am thinking about third solution: generic, but within a specified domain - ideally were the differences between string and symbol stop them from unintentionally being in the same Hash without being too specialized. And without subclassing. Even by just a warning that is emitted when a Hash becomes unsortable, we are not breaking the association array concept while *still* supporting 99% or more actual real world use cases. And not making any type-specific assumptions you presented. As a side effect, if a user writes {'foo': 123}.merge('foo' => 456), they will get a warning instead of just a hash with two pairs. Such a warning most likely will help find design flaws and make difficult to debug errors less often when refactoring. And hopefully encourage a better design or just think a little more about the current one. > > In Ruby "foo" + 123 raises a TypeError. Adding a string > > key to a symbol-keyed Hash doesn't even show a warning. > > I don���t see why it should ��� as long as it still > responds to #hash and #eql?, it���s a valid Hash key. Both methods are specific to Ruby's association array's internals which uses a hash table. Users generally care only about their string->symbol problems until they realize that using strings for keys is generally not a good thing because of problems and debugging time. Implementation wise I think Hash is great. However, the flexibility along with symbol/string similarities and more ingenious uses of Hash will probably cause only more problems over time. Example: Python doesn't have symbols and has named arguments. In Ruby we use a symbol keyed Hash to simulate the latter which is great, but if the hash is not symbol key based, there is no quick, standard way to handle that. Sure, you can ignore or raise or convert, but why handle something you should be able to prevent? Ignoring keys you don't know seems like a good idea, but the result is not very helpful in debugging obscure error messages. And lets face it: most of the Ruby code people work on is not their own. The only people who don't need to care are the experts who already have the right habits and understanding that allows them to avoid problems without too much thought. The rest have to learn the hard way. > Hashes in Ruby serve a lot of purposes (they even maintain insertion > order); if you want to limit their functionality, feel free to subclass. Why do I have to subclass Hash to get a useful named arguments equivalent in Ruby? Why would I want object instances for argument names? Why can't I choose *not* to have them in a simple way? The overhead and effort required to maintain and use a subclass becomes a good enough reason to give up on writing robust code. Which is probably what most rubists do. We have RBTree and HashWithIndifferentAccess. Neither really helps in creating good APIs for many of the wrong reasons: - HWIA is for Rails specific cases but is usually abused to avoid costly string/symbol mistakes - RBTree is a gem most people don't know about and stick with Hash anyway. It adds an ordering requirement but that seems like a side effect. It was proposed to be added in Ruby 1.9, but I don't remember why it ultimately didn't - the {} notation is too convenient to lose in the case of subclassing, especially when Hash is used for method parameters - in practice, you can only use the subclass in your own code > There���s nothing preventing you from subclassing Hash to > create StringKeyHash, SymbolKeyHash or even MonoKeyHash > that would limit the keys��� class to the first one defined. I thought about that exactly to avoid subclassing: by having an alternative to the current Hash already as a standard Ruby collection. But now it think the idea is too limiting to be practical. From the user's perspective, having Hash restrict its behavior the way RBTree does would save people a lot of grief. If Hash changed its behavior in the way described, most of the existing code would work as usual. Manually replacing {} with a subclass in a large project is a waste of time. Hashes are used too often to even consider subclassing. Consider regular expressions: you can specify options to a regexp, defining its behavior. Having the same for hashes could be cool: {'a' => 3, :a => 3}/so # s = strict, o = ordered As examples, we could also have: r = uses RBTree for the Hash (and so implies 's') i = indifferent access, but not recommended (actually, I personally wouldn't want this as an option) > How would you treat subclasses? Let���s say I have a Hash with > keys being instances of People, Employees and Volunteers (with > Employees ans Volunteers being subclasses of People). Should > they all be allowed as keys in a single MonoKeyHash or not? Good example of using a Hash to associate values with (even random) objects! Since having keys orderable already answers the part about allowing into the Hash, I'll concentrate on the case where items are of different types. How about an array of objects and a hash of object id's instead? [ person1, person2, ...] { person1.object_id => some_value, ... } Or just use the results of #hash as the keys if it is about object contents. This makes your intention more explicit. { person1.hash => some_value, ... } If you really need different types as a way of associating values with random objects, you could create a Hash of types and each type would have object instances: { Fixnum => { 1 => "one", 2 => "two" }, String => { "1" => "one", "2" => "two" }, } Then you can use hash[some_key.class][some_key] for access if you *really* need the current behavior. Not much harder to handle, but you have much more control over the hash contents. You probably need to know about used types in the structure anyway to handle its contents (domain). > What about String-only keys, but with different > keys having their own different singleton methods? > > (For discussion���s sake: what about if a couple of the Strings > had redefined #hash and #eql? methods, on an instance level?) That's relying heavily on implementation specific details - like counting on Ruby hashes preserving order or not. That changed actually, yes. I don't really remember what was the main reason though. #hash and #eql? are called by Hash internally - if there is a good reason for redefining these, there is probably a good way to do it without relying on Hash internals. If for some fictional reason Ruby used an rbtree internally for Hash, #<=> would be used instead of #hash + #eql. Everything else would be the same except for allowed key values. > > I think the meaning of symbols and hashes are too similar for such > > different types to be allowed as keys in the same Hash instance. > > But that would introduce a huge exception in the current > very simple model. Ruby is complicated enough; IMHO we > should strive to make it less complicated, not more. Novice users find symbols, strings and Hashes complicated and confusing. Changing this is my focus here. A complex model that is easily discoverable is probably better than a simple model that requires complex solutions from the users to do a great job. I know it takes hard work and countless hours to keep Ruby a fun and great language as it is and I think it pays off, nevertheless. If the goal was to create a simple language with a simple implementation, we would might have had another Java instead. Even if it results in an overly complex parser and implementation, I think only good will come from going out of one's way to make Ruby users lives easier. Which is why I really appreciate your input and for giving me the motivation to understand the topic and Ruby internals better. Thanks! -- Cezary Baginski