From: "trans (Thomas Sawyer)" <transfire@...>
Date: 2013-02-06T05:05:56+09:00
Subject: [ruby-core:51882] [ruby-trunk - Feature #7788][Open] YAML Tag Schema Support


Issue #7788 has been reported by trans (Thomas Sawyer).

----------------------------------------
Feature #7788: YAML Tag Schema Support
https://bugs.ruby-lang.org/issues/7788

Author: trans (Thomas Sawyer)
Status: Open
Priority: Normal
Assignee: 
Category: lib
Target version: next minor


=begin
I have endeavoured to add proper Schema support to Psych (see ((<YAML Spec|URL:http://www.yaml.org/spec/1.2/spec.html#Schema>)) on Schemas). The primary reasons for supporting schemas are two fold: security and global tag conflict. The first is well known b/c of recent events. The second is less realized, but consider is it same problem as using global variables. Different apps have different tags; two identical local tags may have different meanings and thus cause conflict.

The API works like this:

    class Foo
    end

    foo_schema = YAML::Schema.new do |s|
      s.tag '!foo', Foo
    end

    YAML.load('foo.yml', :schema=>foo_schema)

This code would allow only failsafe and json schema tags (core defaults), plus the specifically defined !foo tag.
Also, %TAG prefix is supported:

    foo_schema = YAML::Schema.new(:prefix=>{'!'=>'tag:foo.org/'}) do |s|
      s.tag '!foo', Foo
    end

This will add tag 'tag:foo.org/foo` instead of local `!foo` tag.

To properly support schema, object's must store the tag with which they were loaded in order to ensure correct round tripping. For this there is `tag_uri` attribute.
(Note: I am not sure if it best to store as instance variable, which it currently is, or to store in global table. Need feedback.)

In the process of adding schema support I was able to clean up and generalize loading code. For immutable types and class factories, adding (({ClassName.new_with(coder)})) can be used to instantiate class.

Implementation is close to complete, I believe this is all that remains:

  1. ScalarScanner needs to respect schema (basically if failsafe and/or json schemas are not used).
  2. Dumping needs to take :schema option to limit it to schema tags.
  3. Dumping needs to look to tag_uri for tag by default.
  4. There is one bug I have yet to figure out (test_spec_builtin_map).
  5. I have questions about Coder, b/c it seems more complex than it needs to be.

I am also considering refactoring Schemas as modules that can be included into other schema. Currently they are classes/objects that can be subclassed or merged via `+`, e.g.

    LEGACY_SCHEMA = CORE_SCHEMA + RUBY_SCHEMA + OBJECT_SCHEMA + SYCK_SCHEMA

Of course, as with any new code, there's sure to be corner cases to work out. Having other pound on it for a while would be very helpful. Oh, and I should also mention I am documenting as much of the code as can.

Feel free to ask me any questions for more details about the code. You can find the branch here: https://github.com/trans/psych/tree/isotag
=end


-- 
http://bugs.ruby-lang.org/