Re: YAML problem, possibly?
From:
why the lucky stiff <ruby-core@...>
Date:
2004-08-10 17:43:26 UTC
List:
ruby-core #3274
Hugh Sasse Staff Elec Eng wrote:
> I obtained a largish lump of shallow XML, succcessfully read it with
> REXML, and then tried to YAML dump it.
...
> E:\Downloads\AnnaAIML-.7.0\anna_brain>xml_to_yaml.rb
> <aiml> ... </>
> c:/ruby/lib/ruby/1.8/yaml/rubytypes.rb:287:in `is_complex_yaml?':
> stack level t
> o deep (SystemStackError)
With a complex tree like a REXML::Document, dumping to YAML can be
pretty complicated. There's definitely a problem here that needs to be
solved in the emitter. The emitter is due for a rewrite and I've just
finished a batch of fixes to the parser, so I think I have time to work
on this.
In the short term, however, I really don't think you want YAML documents
which contain REXML::Document dumps. If you really want to serialize
the REXML::Document, I'd strongly suggest that you use Marshal. You
won't end up with a readable structure in YAML, so there's no point
using it.
--- however, ---
If you want to filter the AIML documents into a readable YAML format, I
have a script that can help you. I've used this script to convert XML
formats into YAML and just haven't found much of an inclination to turn
it into a serious project. But it works and I was able to convert all
of the valid XML documents in the anna_brain directory into readable YAML.
The attached script uses a basic schema diagram, written in YAML, to
perform the conversion. These AIML files are kinda sketchy, some aren't
valid, some don't follow a very good schema. But here's the barebones:
--- %YAML:1.0
aiml:
xmlns: skip
version: skip
category*:
pattern:
template: html
that: ~
think: ~
topic*:
Place the attached script just above the anna_brain directory and run:
ruby aiml2yaml.rb. It will create 1.yaml, 2.yaml, 3.yaml, etc.
side-by-side with the AIML files.
The schema above is contained in the aiml2yaml.rb file. The schema uses
a bit of notation to help the xml2yaml method translate the file. Each
mapping key represents an element tag. Each key which ends with '*'
allows multiple elements with that tag. (Translating them into an
array.) When a mapping key has a value of '~', then that element is
optional. The 'html' value I use to indicate that the content of a node
should be preserved as it was encoded in the XML. The 'skip' value
ignores those nodes completely.
Since D.aiml, test.aiml, and topics.aiml aren't parseable, I can't test
those files, but the rest convert fine. Alright, well, that's it. Enjoy.
_why
Attachments (1)
aiml2yaml.rb
(2.87 KB, text/x-ruby)
require "date"
require "rexml/document"; include REXML
require "yaml"
anna_schema = YAML::load <<EOY
--- %YAML:1.0
aiml:
xmlns: skip
version: skip
category*:
pattern:
template: html
that: ~
think: ~
topic*:
EOY
class SchemaError < Exception; end
class AnnaDictionary
attr_accessor :categories, :topics
def to_yaml_type
"!annabot.sf.net,2004/dictionary"
end
end
YAML.add_domain_type( 'annabot.sf.net,2004', 'dictionary' ) do |type, val|
YAML.object_maker( AnnaDictionary, {'categories' => val['category'], 'topics' => val['topics']} )
end
def xmltext2yamltext( content, instruct )
if instruct == 'timestamp'
content = YAML::load( "--- #{content}" )
elsif instruct == 'int'
content = content.to_i
else
if content =~ /^(\t+)/
tabs = $1
content.gsub!( /^#{ tabs }/, '' )
end
content.strip!
content.gsub!( /\t/, ' ' )
end
content
end
def xml2yaml( doc, schema, path = '/' )
yaml = {}
ele_log = {}
doc.elements.each do |ele|
ele_name = ele.name
unless schema.has_key? ele_name
ele_name += "*"
unless schema.has_key? ele_name
raise SchemaError, "No schema at #{path} for element `#{ele.name}'"
end
end
ele_schema = schema[ele_name]
# Elements with multiple entries
ele_idx = 0
check_mult = ele_name.match /(\w+)\*$/
if check_mult
ele_name = check_mult[1]
if ele_log.has_key? ele_name
ele_log[ele_name] += 1
else
ele_log[ele_name] = 0
end
ele_idx = ele_log[ele_name]
end
ele_hash = {}
content = nil
if [Hash, Array].include? ele_schema.class
content = xml2yaml( ele, ele_schema, "#{path}#{ele.name}/" )
else
content = ele.children.collect do |child|
child.to_s
end.join( '' )
content = xmltext2yamltext( content, ele_schema )
end
unless ele_schema == 'skip'
ele_hash = content
end
if [Hash, Array].include? ele_schema.class
ele.attributes.each do |name, val|
unless ele_schema.has_key? name
raise SchemaError, "No schema at #{path} for attribute `#{ele.name}.#{name}'"
end
unless [Hash, Array].include? ele_hash.class
raise SchemaError, "String at #{path}: #{ele_hash.inspect}, cannot add attribute `#{ele.name}.#{name}'"
end
next if ele_schema[name] == 'skip'
ele_hash[name] = xmltext2yamltext( val, ele_schema[name] )
end
end
if check_mult
yaml[ele_name] ||= []
yaml[ele_name][ele_idx] = ele_hash
else
yaml[ele_name] = ele_hash
end
end
yaml
end
if __FILE__ == $0
Dir['anna_brain/*.aiml'].each do |file|
puts "*** Processing #{ file }"
doc = Document.new File.new( file )
ymap = xml2yaml( doc.root, anna_schema['aiml'] )
yanna = YAML::object_maker( AnnaDictionary, ymap )
File.open( file.gsub( /aiml/, 'yaml' ), 'w' ) do |yout|
yout.puts yanna.to_yaml( :UseBlock => true )
end
end
end