[#116016] [Ruby master Bug#20150] Memory leak in grapheme clusters — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>
Issue #20150 has been reported by peterzhu2118 (Peter Zhu).
7 messages
2024/01/04
[#116382] [Ruby master Feature#20205] Enable `frozen_string_literal` by default — "byroot (Jean Boussier) via ruby-core" <ruby-core@...>
Issue #20205 has been reported by byroot (Jean Boussier).
77 messages
2024/01/23
[ruby-core:116290] [Ruby master Misc#20191] Deprecate magic encoding comment
From:
duerst via ruby-core <ruby-core@...>
Date:
2024-01-18 03:42:04 UTC
List:
ruby-core #116290
Issue #20191 has been updated by duerst (Martin D=FCrst). For the record, I agree with Hiroshi, Kenta, and Yui. The changes from Pyth= on 2 to Python 3 didn't work in favor of Python (summarizing Yehuda Katz). = The above change would be of a similar magnitude, with similar implications. The proposed change might work if announced very long-term, e.g. for 2030 o= r so. Just doing it now and "hope for the best" is a bad idea. ---------------------------------------- Misc #20191: Deprecate magic encoding comment https://bugs.ruby-lang.org/issues/20191#change-106312 * Author: kddnewton (Kevin Newton) * Status: Rejected * Priority: Normal ---------------------------------------- I would like to ask that we deprecate the magic encoding comment, and inste= ad require all source files to be encoded in UTF-8. There would be many benefits to the performance of both the parser and comp= iler. It would also help to simplify both. For example, right now a string = literal in a file encoded in US-ASCII can result in 3 different encodings, = depending on its internal bytes. If the file is encoded in UTF-8, it can on= ly be a UTF-8 string. The encoding comment itself is not very commonly used in gems. If you take = the top 100 most downloaded gem versions from rubygems.org and look at the = resolved encoding of all of the files, you get: - UTF-8: 11554 - ASCII-8BIT: 35 - US-ASCII: 10 For all of the most recent versions of gems on rubygems.org, you get: - UTF-8: 2967421 - US-ASCII: 20130 - ASCII-8BIT: 9237 - ISO-8859-1: 87 - Windows-1252: 45 - Shift_JIS: 32 - Windows-31J: 22 - Windows-1251: 15 - EUC-JP: 11 - GBK: 4 - KOI8-R: 3 - ISO-8859-15: 2 - UTF8-MAC: 1 - invalid: 33 Note that "invalid" here could have worked on some rubies < 3.2 if they use= d Encoding#replicate. If we were to change this, the main breaking change concern would be the en= coding of strings and symbols that would leave the context of the file by v= irtue of a constant read/method call. That's why I think it should first be= deprecated in a minor release, then removed in the next major. At the mome= nt this would mean for the top 100 gems we would be worried about 0.39% of = files, and on rubygems.org as a whole we would be worried about 0.99% of fi= les. If deprecating the entire encoding comment is unacceptable from a compatibi= lity point of view, I would suggest we try only allowing UTF-8, US-ASCII, a= nd ASCII-8BIT. This would still have a lot of value/simplifications/perform= ance opportunities, at the expense of still needing to be parsed and checke= d. On the top 100 gems this would mean no files would have to change, and o= n rubygems.org as a whole it would mean we would be worried about 0.009% of= files. That being said, if we're going to deprecate this at all, we should= probably just do it all the way to get the full benefit. (In case you want to check the math, the script used to calculate these is = attached.) ---Files-------------------------------- gems.rb (4.33 KB) --=20 https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c= ore.ml.ruby-lang.org/