From: jean.boussier@... Date: 2020-05-26T09:55:54+00:00 Subject: [ruby-core:98524] [Ruby master Feature#16848] Allow callables in $LOAD_PATH Issue #16848 has been updated by byroot (Jean Boussier). Sorry I missed your answer from 5 days ago. I wish I'd answered before your second meeting (Redmine no longer send notifications...). > Could you propose another proposal to care compatibility? To be honest I kinda expected this concern, but hoped it wouldn't be one. ### What Python went through That exact same problem was posed to Python when they did this https://www.python.org/dev/peps/pep-0302/#rationale (the whole section is worth a read) It was proposed to make these Loaders quacks like strings to offer backward compatibility, but they decided to go with a fairly complex combination of `sys.path_hooks`, `sys.meta_path` and `sys.path_importer_cache` alongside `sys.path`, which I have to admit I don't really like. It is hard to understand the flow, requires to keep yet another cache of which loader can load what, and it is hard to tightly control the load order between custom loaders and regular paths. ### String duck-typing So my personal preference would be to make these Finders/Loader delegate to a string on method missing, with maybe a deprecation warning. That's very easy to implement and would keep backward compatibility in the vast majority of the cases. It also keep the main benefit which is easy and simple control over the load order. ### $LOAD_PATH as a "view" But if it's considered too dirty, then a more complex proposal could be to make `$LOAD_PATH` a simple "view" of `$REAL_LOAD_PATH` (name to be defined, ideally it would no longer be a global). Its value when accessed would be `$LOAD_PATH.select(String)`. It would also continue to support mutations, e.g. `$LOAD_PATH.unshift('/some/path')` can easily be forwarded to `$REAL_LOAD_PATH`. Looking at `$LOAD_PATH.public_methods` I see `#insert` that might be non trivial to delegate, as the position would need to be translated, but at this stage I don't personally foresee an impossibility. @ko1 at first glance, do any of these two solutions seem acceptable to you? If so I can make a more formal specification of the one of your choosing, and even try to implement it, or part of it as a proof of concept. > In discussion at devmeeting, there is possibility to introduce a feature to help $LOAD_PATH observing which bootsnap does. It would be nice as it would remove some fairly dirty and inefficient code from Bootsnap, but it kind of is a "solved" problem. That's no longer what is challenging in Bootsnap. > Also Matz said he prefers to introduce Python's loader/finder mechanism. That makes sense. If a change is to be made, might as well go all the way. > To introduce them, maybe deep discussion is needed. I suppose so. I obviously can't take API decisions, but know that I'm serious about this proposal and I'm willing to put as much work as needed into this. ---------------------------------------- Feature #16848: Allow callables in $LOAD_PATH https://bugs.ruby-lang.org/issues/16848#change-85801 * Author: byroot (Jean Boussier) * Status: Feedback * Priority: Normal ---------------------------------------- Make it easier to implement `$LOAD_PATH` caching, and speed up application boot time. I benchmarked it on Redmine's master using bootsnap with only the optimization enabled: ```ruby if ENV['CACHE_LOAD_PATH'] require 'bootsnap' Bootsnap.setup( cache_dir: 'tmp/cache', development_mode: false, load_path_cache: true, autoload_paths_cache: true, disable_trace: false, compile_cache_iseq: true, compile_cache_yaml: false, ) end ``` ``` $ RAILS_ENV=production time bin/rails runner 'p 1' 2.66 real 1.99 user 0.66 sys $ RAILS_ENV=production time bin/rails runner 'p 1' 2.71 real 1.97 user 0.66 sys $ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1' 1.41 real 1.12 user 0.28 sys $ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1' 1.41 real 1.12 user 0.28 sys ``` That's twice for a relatively small application. And the performance improvement is not linear; the larger the application, the larger the improvement. ### How it works `require` has `O($LOAD_PATH.size)` performance. The more gems you add to your `Gemfile`, the larger `$LOAD_PATH` becomes. `require "foo.rb"` will try to open the file in each of the `$LOAD_PATH` entries. And since more gems usually also means more `require` calls, loading Ruby code may take up to quadratic performance loss. To improve this, Bootsnap pre-computes a map of all the files in your `$LOAD_PATH`, and uses it to convert relative paths into absolute paths so that Ruby skips the `$LOAD_PATH` traversal. ```ruby $LOAD_PATH = $w(/gems/foo/lib /gems/bar/lib) BOOTSNAP_CACHE = { "bar.rb" => "/gems/bar/lib/bar.rb", } ``` This resolves file lookup by a single hash lookup, and reduces boot performance from roughly `O($LOAD_PATH.size * number_of_files_to_require)` to `O(number_of_files_to_require)`. This optimization is also used in [Gel](https://github.com/gel-rb/gel), a Rubygems/Bundler replacement. ### Trade offs Every time `$LOAD_PATH` is modified, the cache must become invalidated. While this is complex to do for Bootsnap, it would be fairly easy if it is implemented inside Ruby. More importantly, you have to invalidate the cache whenever you add or delete a file to/from one of the `$LOAD_PATH` members; otherwise, if you shadow or unshadow another file farther in the `$LOAD_PATH`, Bootsnap will load a wrong file. For instance, if `require "foo.rb"` initially resolves to `/some/gem/foo.rb`, and you create `lib/foo.rb`, you'll need to flush Bootsnap cache. That latter is trickier, and Bootsnap has decided that it is rare enough to cause actual problems, and so far that holds. But that is not a trade off Ruby can make. However that's probably a tradeoff Rubygems/Bundler can make. While it's common to edit your gems to debug something, it's really uncommon to add or remove files inside them. So in theory Rubygems/Bundler could compute a map of all files in a gem that can be required after it installs it. Then when you activate it, you merge it together with the other activated gems. ### Proposal This could be reasonably easy to implement if `$LOAD_PATH` accepted callables in addition to paths. Something like this: ```ruby $LOAD_PATH = [ 'my_app/lib', BundlerOrRubygems.method(:lookup), ] ``` The contract would be that `BundlerOrRubygems.lookup("some_relative/path.rb")` would return either an absolute path or `nil`. With such API, it would be easy to cache absolute paths only for gems and the stdlib, and preserve the current cache-less behavior for the application specific load paths, which are usually much less numerous. It would also allow frameworks such as Rails to implement the same caching for application paths when running in an environment where the source files are immutable (typically production). -- https://bugs.ruby-lang.org/ Unsubscribe: