From: "Eregon (Benoit Daloze) via ruby-core" Date: 2025-12-15T18:29:39+00:00 Subject: [ruby-core:124216] [Ruby Bug#21783] {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters Issue #21783 has been updated by Eregon (Benoit Daloze). Updating the docs is one solution, so at least it's consistent between docs and behavior. I think as a Ruby-facing API it's weird that it operates in terms of bytes (and `source_location` does not have a `byte` prefix to indicate that). I think most programmers when they hear `line 4 column 6` they expect the 6th character on the 4th line, not the character starting at the 6th byte (actually hard to find in an editor, most editors don't show "byte columns", in fact it's not even possible to place the cursor at some byte positions, every programmer always think in characters when looking at source code). For example, one might expect that highlighting with `^` based on the return values from `source_location` works, but it doesn't: ```ruby def underline(callable) file, start_line, start_column, end_line, end_column = callable.source_location raise unless start_line == end_line source = File.readlines(file)[start_line-1] puts source puts ' '*start_column + '^'*(end_column-start_column) end my_proc = proc { ascii-only } underline my_proc my_proc = proc { il ��tait une fois un ��t�� } underline my_proc ``` gives ``` $ ruby underline.rb my_proc = proc { ascii-only } ^^^^^^^^^^^^^^ my_proc = proc { il ��tait une fois un ��t�� } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` > Either both parsers/compilers would have to do this work initially, as that's where the numbers come from, or the source_location function would have to re-parse the source, which is not possible in some cases. This is a good point, I didn't realize that. I think it would still be worth it to change the parsers/compilers to compute the proper character column for literal lambdas, blocks and methods, and probably wouldn't be very expensive given most source files are ASCII-only and potentially the parsers could even use the knowledge that a given line is ASCII-only so it would still be as fast even if the file contains a few non-ASCII characters. If columns would e.g. appear in error messages, I think everyone would expect them to be character columns, not byte columns. For example gcc shows character columns, as one would expect: ```c int main() { /* ��t�� */ notexist } ``` ``` gcc test.c test.c: In function ���main���: test.c:2:15: error: ���notexist��� undeclared (first use in this function) 2 | /* ��t�� */ notexist | ^~~~~~~~ ``` Note it's 2:15 (i.e. character columns), not 2:17 (byte columns). The highlighting also needs to use character columns of course. ---------------------------------------- Bug #21783: {Method,UnboundMethod,Proc}#source_location returns columns in bytes and not in characters https://bugs.ruby-lang.org/issues/21783#change-115699 * Author: Eregon (Benoit Daloze) * Status: Open * ruby -v: ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- The documentation says: ``` = Proc.source_location (from ruby core) ------------------------------------------------------------------------ prc.source_location -> [String, Integer, Integer, Integer, Integer] ------------------------------------------------------------------------ Returns the location where the Proc was defined. The returned Array contains: (1) the Ruby source filename (2) the line number where the definition starts (3) the column number where the definition starts (4) the line number where the definition ends (5) the column number where the definitions ends This method will return nil if the Proc was not defined in Ruby (i.e. native). ``` So it talks about column numbers, so it should be a number of characters and not of bytes. But currently it's a number of bytes: ``` $ ruby --parser=prism -ve 'def ��t��; end; p method(:��t��).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) +PRISM [x86_64-linux] ["-e", 1, 0, 1, 14] $ ruby --parser=parse.y -ve 'def ��t��; end; p method(:��t��).source_location' ruby 4.0.0dev (2025-12-14T07:11:02Z master 711d14992e) [x86_64-linux] ["-e", 1, 0, 1, 14] ``` The last number should be 12 because `"def ��t��; end".size` is 12 characters. This is a Ruby-level API so I would never expect "byte columns" here, I think it's clear it should be a number of "editor columns" i.e. a number of characters. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/