From: jean.boussier@... Date: 2021-01-21T11:13:50+00:00 Subject: [ruby-core:102186] [Ruby master Misc#17565] Prefer use of access(2) in rb_file_load_ok() to check for existence of require'd files Issue #17565 has been updated by byroot (Jean Boussier). > open(2) with a mean cost of 130�sec per call, where a call to access(2) has a cost around 5� lower (something around 28�sec) Which system was this measured on? Is this only a Docker thing? Also was it 5x faster on existing files or on existing ones? Because intuitively you'd think for a missing file the same work would be performed (path lookup). Also note that recent OSX are infamous for their slow `open(2)` (and `stat()`) calls (because of some security feature), I'm not sure wether `access(2)` is impacted as well, but it could very well degrade performance on OSX. Also ultimately, this is a problem solved by Bootsnap by avoiding the `O($LOAD_PATH.size)` performance of `Kernel.require` by precompiling and caching all loadable paths. ---------------------------------------- Misc #17565: Prefer use of access(2) in rb_file_load_ok() to check for existence of require'd files https://bugs.ruby-lang.org/issues/17565#change-90033 * Author: leehambley (Lee Hambley) * Status: Open * Priority: Normal ---------------------------------------- When using Ruby in Docker (2.5 in our case, but the code is unchanged in 15 years across all versions) with a large $LOAD_PATH some millions of calls are made to `open(2)` with a mean cost of 130�sec per call, where a call to `access(2)` has a cost around 5� lower (something around 28�sec). With a Rails 5 app, without Zeitwerk, the load path is searched iteratively looking for a file to define a constant, this causes something like 2,000,000 calls to `open(2)` of which 97.5% are failing with `ENOENT`. I believe that the cost of two syscalls (`open(2)` only after successful `access(2)`) would, in our case, at least because we would shave-off something like 1,900,000�90�sec (2.85 minutes) from the three minute boot time for our application. I prepared a very na�ve patch with a simple early-return in `rb_file_load_ok`: ``` diff --git a/file.c b/file.c index 3bf092c05c..c7a7635125 100644 --- a/file.c +++ b/file.c @@ -5986,6 +5986,16 @@ rb_file_load_ok(const char *path) O_NDELAY | #endif 0); + if (access(path, R_OK) == -1) return 0; int fd = rb_cloexec_open(path, mode, 0); if (fd == -1) return 0; rb_update_max_fd(fd); ``` This hasn't been exhaustively tested as I simply haven't had time yet, but at least it compiled and passed `make check`. I spoke with Aaron Patterson on Twitter, who suggested maybe a wiser approach would be a heuristic approach one level higher (`rb_find_file`?) which switches the strategy based on the length of the LOAD_PATH. Alternatively, maybe the patch could be conditional, guarded somehow, and conditionally compiled only into the Rubies built for Docker, in a way that is portable to the common Ruby version managers. I am opening this ticket to track my own work, as much as anything, with no expectation that someone implement this on my behalf. I am eager to contribute to Ruby for all the benefit I have seen from it in my career. If someone knows hints why this may be an unsuccessful adventure, I gratefully receive any and all feedback. -- https://bugs.ruby-lang.org/ Unsubscribe: