From: duerst@... Date: 2019-04-13T09:51:35+00:00 Subject: [ruby-core:92275] [Ruby trunk Bug#15764] Whitespace and control characters should not be permitted in tokens Issue #15764 has been updated by duerst (Martin D�rst). Backport set to 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN Assignee set to matz (Yukihiro Matsumoto) Tracker changed from Feature to Bug I also think this is a bug. I have changed the category accordingly. I think we should restrict the characters usable in identifiers to some reasonable ranges. I agree that we mainly want to focus on ASCII programs, but we should do at least a sanity check for the rest of Unicode, and that's clearly not happening now. As a base for this, it's best to look at Unicode Standard Annex #31, Unicode Identifier And Pattern Syntax (http://www.unicode.org/reports/tr31/). A regular expression for the identifier syntax defined in UAX #31 is easily available in Ruby: `/\p{id_start}\p{id_continue}*/`. The character ranges covered by these properties can be checked in enc/unicode/12.1.0/name2ctype.h, from lines 15267 and 15881 (the file is too large for the Web interface to svn). The only additions we seem to need are '_' in initial position, sigils for the different kinds of identifiers, and final '!', '?', and '=' for method names. I suspect that it may take @nobu just a few hours to actually implement this, and that the backwards-compatibility issues (existing Ruby programs stopping to work) are extremely minimal and limited to examples that show the problem. I have added this to the list of issues to be discussed at next week's developers' meeting, but I will not be at the meeting itself. If needed, I can join the discussion at the first day of RubyKaigi itself. I have assigned this issue to Matz because I'd like him to give it a sanity check. ---------------------------------------- Bug #15764: Whitespace and control characters should not be permitted in tokens https://bugs.ruby-lang.org/issues/15764#change-77610 * Author: BatmanAoD (Kyle Strand) * Status: Open * Priority: Normal * Assignee: matz (Yukihiro Matsumoto) * Target version: * ruby -v: * Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- As of Ruby 2.5.1p57, it appears that all valid Unicode code-points above 128 are permitted in tokens. This includes whitespace and control characters. This was demonstrated here: https://gist.github.com/qrohlf/7045823 I have attached the raw download from the above gist. The issue has been discussed on StackOverflow: https://stackoverflow.com/q/34455427/1858225 I would say this is arguably a bug, but I am marking this ticket as a "feature" since the current behavior could be considered by-design. ---Files-------------------------------- helloworld.rb (543 Bytes) -- https://bugs.ruby-lang.org/ Unsubscribe: