From: "Eregon (Benoit Daloze) via ruby-core" Date: 2025-01-08T10:24:30+00:00 Subject: [ruby-core:120555] [Ruby master Feature#21005] Update the source location method to include line start/stop and column start/stop details Issue #21005 has been updated by Eregon (Benoit Daloze). It's very important that this new feature does not expect users to use `RubyVM::InstructionSequence` or anything under `RubyVM` since `RubyVM` is CRuby-only. The feature itself is possible on any Ruby implementation. So something like `Prism.node_for(Proc|Method|UnboundMethod)` is good, and `Prism.ast_for(RubyVM::InstructionSequence)` is not. Internally Prism can of course use `RubyVM::InstructionSequence.of(Proc|Method|UnboundMethod).node_id` on CRuby, and something else on other Ruby implementations. Note that if it's enough to locate a node by its start/end line/column, we might not need `node_id` at all, and then just providing start/end line/column to `source_location` would be enough to find the right node with Prism. Are there cases where this would be a problem, i.e. where 2 Prism AST nodes would have the same start/end line/column? Actually since we are only talking about `Proc|Method|UnboundMethod` here it would need to be two nodes which define a proc/lambda/method with the same start/end line/column. I think that's not possible. If that holds, then the original proposal to provide start/end line/column is enough, and we can add a convenience method in Prism using those. That would work on all Ruby implementations, without needing a low-level implementation-specific concept of `node_id`: ```ruby module Prism def self.node_for callable start_line, end_line, start_column, end_column = callable.source_location(true) ast.value.breadth_first_search { |node| loc = node.location loc.start_line == start_line and loc.end_line == end_line and loc.start_column == start_column and loc.end_column == end_column } end end ``` Maybe CRuby does not currently preserve the information of end line and start/end column for procs and methods? For `def` it would be trivial to preserve it but I guess for blocks and `define_method` is might be trickier. For such cases `source_location` could internally use the `node_id` stuff if that's easier or deemed a better trade-off on CRuby. In summary: * I think we can build `Prism.node_for(Proc|Method|UnboundMethod)` on `(Proc|Method|UnboundMethod)#source_location` with start/end line/column. * Those would all be public APIs working on all Ruby implementations. * Users don't need to know about low-level implementation-specific (i.e. CRuby-only) concepts like `node_id`. ---------------------------------------- Feature #21005: Update the source location method to include line start/stop and column start/stop details https://bugs.ruby-lang.org/issues/21005#change-111366 * Author: bkuhlmann (Brooke Kuhlmann) * Status: Open ---------------------------------------- ## Why ���� Hello. After discussing with Kevin Newton and Benoit Daloze in [Feature 20999](https://bugs.ruby-lang.org/issues/20999), I'd like to propose adding line start/stop and column start/stop information to the `#source_location` method for the following objects: - [Binding](https://docs.ruby-lang.org/en/master/Binding.html) - [Proc](https://docs.ruby-lang.org/en/master/Proc.html) - [Method](https://docs.ruby-lang.org/en/master/Method.html) - [UnboundMethod](https://docs.ruby-lang.org/en/master/UnboundMethod.html) At the moment, when using `#source_location`, you only get the following information: ``` ruby def demo = "A demonstration." # From disk. method(:demo).source_location # ["/Users/bkuhlmann/Engineering/Misc/demo", 15] # From memory. method(:demo).source_location # ["(irb)", 3] ``` Notice, when asking for the source location, we only get the path/location as the first element and the line number as the second element but I'd like to obtain a much richer set of data which includes line start/stop and column start/stop so I can avoid leaning on the `RubyVM` for this information. Example: ``` ruby def demo = "A demonstration." # From disk. instructions = RubyVM::InstructionSequence.of method(:demo) puts [instructions.absolute_path, *instructions.to_a.dig(4, :code_location)] [ "/Users/bkuhlmann/Engineering/Misc/demo", # Source path. 15, # Line start. 0, # Column start. 15, # Line stop. 29 # Column stop. ] # From memory. instructions = RubyVM::InstructionSequence.of method(:demo) puts instructions.script_lines [ "def demo = \"A demonstration.\"\n", "" ] ``` By having access to the path (or lack thereof in case of IRB), line start/stop, and column start/stop, this means we could avoid using the RubyVM to obtain raw source code for any of these objects. This would not only enhance debugging situations but also improve Domain Specific Languages that wish to leverage this information for introducing new features and/or new debugging capabilities to the language. ## How Building upon the examples provided above, I'd like to see `Binding`, `Proc`, `Method`, and `UnboundMethod` respond to `#source_location` as follows: ``` ruby [ "/Users/bkuhlmann/Engineering/Misc/demo", # Source path. 15, # Line start. 15, # Line stop. 0, # Column start. 29 # Column stop. ] ``` Notice, for data grouping purposes, I changed the array structure to always start with the path as the first element, followed by line information, and ending with column information. Alternatively, it could might be nice to improve upon the above by answering a hash each time, instead, for a more self-describing data structure. Example: ``` ruby { path: "/Users/bkuhlmann/Engineering/Misc/demo", line_start: 15, line_stop: 15, column_start: 0, column_stop: 29 } ``` For in-memory, situations like IRB, it would be nice to answer the equivalent of `RubyVM::InstructionSequence#script_lines` which would always be an `Array` with no line or column information since only the source code is necessary. Example: ``` ruby [ "def demo = \"A demonstration.\"\n", "" ] ``` From a pattern matching perspective, this could provide the best of both worlds especially if information is answered as either a `Hash` or and `Array`. Example: ``` def demo = "A demonstration." case method(:demo).source_location in Hash then puts "Source information obtained from disk." in Array then puts "Source obtained from memory." else fail TypeError, "Unrecognized source location type." end ``` This above is only a simple example but there's a lot we could do with this information if the above pattern match was enhanced to deal with the extraction and formatting of the actual source code! ## Notes This feature request is related to the following discussions in case more context is of help: - [Feature 6012](https://bugs.ruby-lang.org/issues/6012) - [Feature 20999](https://bugs.ruby-lang.org/issues/20999) -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/