From: jean.boussier@... Date: 2021-01-21T20:52:57+00:00 Subject: [ruby-core:102190] [Ruby master Misc#17565] Prefer use of access(2) in rb_file_load_ok() to check for existence of require'd files Issue #17565 has been updated by byroot (Jean Boussier). > I wonder how to reconcile such a caching with the other problem of Docker (which is often that you don't get reliable filesystem events to clear a cache with) Bootsnap doesn't rely on FS events. > Rails for example can be a long-living Ruby process which may need to be able to require newly created files In newer setup using Zeitwerk, It's better to not have the apps path cached by Bootsnap, because Zeitwerk only require absolute paths anyway, so it's redundant. There's an option in Rails for that: `config.add_autoload_paths_to_load_path = false`. I suppose it will become the default soon when the "classic" autoloader is fully removed. > Bootsnap causes a lot of writes when booting I assume you refer to the compile cache? You should try enabling Bootsnap load_path caching, but disabling the other caches (Iseq & YAML). Honestly the Iseq cache isn't that effective anyway, the real big saver is the load_path_cache, and that one is very small, so doesn't write a lot. It does need to scan directories though. ---------------------------------------- Misc #17565: Prefer use of access(2) in rb_file_load_ok() to check for existence of require'd files https://bugs.ruby-lang.org/issues/17565#change-90036 * Author: leehambley (Lee Hambley) * Status: Open * Priority: Normal ---------------------------------------- When using Ruby in Docker (2.5 in our case, but the code is unchanged in 15 years across all versions) with a large $LOAD_PATH some millions of calls are made to `open(2)` with a mean cost of 130�sec per call, where a call to `access(2)` has a cost around 5� lower (something around 28�sec). With a Rails 5 app, without Zeitwerk, the load path is searched iteratively looking for a file to define a constant, this causes something like 2,000,000 calls to `open(2)` of which 97.5% are failing with `ENOENT`. I believe that the cost of two syscalls (`open(2)` only after successful `access(2)`) would, in our case, at least because we would shave-off something like 1,900,000�90�sec (2.85 minutes) from the three minute boot time for our application. I prepared a very na�ve patch with a simple early-return in `rb_file_load_ok`: ``` diff --git a/file.c b/file.c index 3bf092c05c..c7a7635125 100644 --- a/file.c +++ b/file.c @@ -5986,6 +5986,16 @@ rb_file_load_ok(const char *path) O_NDELAY | #endif 0); + if (access(path, R_OK) == -1) return 0; int fd = rb_cloexec_open(path, mode, 0); if (fd == -1) return 0; rb_update_max_fd(fd); ``` This hasn't been exhaustively tested as I simply haven't had time yet, but at least it compiled and passed `make check`. I spoke with Aaron Patterson on Twitter, who suggested maybe a wiser approach would be a heuristic approach one level higher (`rb_find_file`?) which switches the strategy based on the length of the LOAD_PATH. Alternatively, maybe the patch could be conditional, guarded somehow, and conditionally compiled only into the Rubies built for Docker, in a way that is portable to the common Ruby version managers. I am opening this ticket to track my own work, as much as anything, with no expectation that someone implement this on my behalf. I am eager to contribute to Ruby for all the benefit I have seen from it in my career. If someone knows hints why this may be an unsuccessful adventure, I gratefully receive any and all feedback. -- https://bugs.ruby-lang.org/ Unsubscribe: