From: Greg.mpls@... Date: 2017-11-22T20:36:30+00:00 Subject: [ruby-core:83864] [Ruby trunk Bug#14126] Recent parse.y (Ripper) changes - lexing, tokenizing Issue #14126 has been reported by MSP-Greg (Greg L). ---------------------------------------- Bug #14126: Recent parse.y (Ripper) changes - lexing, tokenizing https://bugs.ruby-lang.org/issues/14126 * Author: MSP-Greg (Greg L) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.5.0dev (2017-11-22 trunk 60878) [x64-mingw32] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- First of all, I'd like to thank @yui-knk for all the work on `parse.y`. I assume some of it is due the movement of `RDoc` from 'seattlerb' to 'ruby', along with `RDoc` now using Ripper instead of its own parser. I'm a `YARD` user. Recent commits have broken some of `YARD`'s parsing code, although many of the commits actually fixed odd behavior in `Ripper`. I did find one thing that seems odd. It centers on whether `Ripper.tokenize(src).join('') == src` or `Ripper.tokenize(src).join('').length == src.length` should be true. I believe the actual issue for YARD is the following constraint: ``` src == Ripper.lex(src).each { |t| combined << t[2] } ``` Using the listed code, svn 60863 shows true for every source string, but 60878 shows false. The extra white-space content that appears in the `:on_tstring_content` members with 60863 has been (understandably) removed in 60878, but it has not been accounted for in the `:on_words_sep` (or `:on_qwords_beg`) members. ```ruby # frozen_string_literal: true require 'ripper' require 'pp' module RipperPercent def self.run output "%w(\n AA\n BB\n CC\n DD\n)" output "%w(\n\nAA\n\nBB\n\nCC\n\nDD\n)" output "%w(\n AA BB CC DD\n)" end def self.output(s) combined = ''.dup Ripper.lex(s).each { |t| combined << t[2] } puts puts "src #{s.gsub("\n", "\\n")}" puts "lexed #{combined.gsub("\n", "\\n")}" puts "src == lexed is #{s == combined}" # puts ; pp Ripper.lex(s) # puts Ripper.tokenize(s).inspect # pp Ripper.sexp_raw(s) end end RipperPercent.run ``` As mentioned previously, I'm not much of a c type, and much of `Ripper` is not doc'd very well. Hence, I don't think I can fix this, if indeed it's an issue. I'm also aware of the complication that sometimes "\n" is equivalent to a space, and other times it's equivalent to ';'. Finally, given all the changes that have occurred, when they seem stable/complete, might the version of Ripper be incremented? Thanks, Greg -- https://bugs.ruby-lang.org/ Unsubscribe: