[ruby-core:102186] [Ruby master Misc#17565] Prefer use of access(2) in rb_file_load_ok() to check for existence of require'd files
From:
jean.boussier@...
Date:
2021-01-21 11:13:50 UTC
List:
ruby-core #102186
Issue #17565 has been updated by byroot (Jean Boussier).
> open(2) with a mean cost of 130挙ec per call, where a call to access(2) has a cost around 5ラ lower (something around 28オsec)
Which system was this measured on? Is this only a Docker thing?
Also was it 5x faster on existing files or on existing ones? Because intuitively you'd think for a missing file the same work would be performed (path lookup).
Also note that recent OSX are infamous for their slow `open(2)` (and `stat()`) calls (because of some security feature), I'm not sure wether `access(2)` is impacted as well, but it could very well degrade performance on OSX.
Also ultimately, this is a problem solved by Bootsnap by avoiding the `O($LOAD_PATH.size)` performance of `Kernel.require` by precompiling and caching all loadable paths.
----------------------------------------
Misc #17565: Prefer use of access(2) in rb_file_load_ok() to check for existence of require'd files
https://bugs.ruby-lang.org/issues/17565#change-90033
* Author: leehambley (Lee Hambley)
* Status: Open
* Priority: Normal
----------------------------------------
When using Ruby in Docker (2.5 in our case, but the code is unchanged in 15 years across all versions) with a large $LOAD_PATH some millions of calls are made to `open(2)` with a mean cost of 130オsec per call, where a call to `access(2)` has a cost around 5ラ lower (something around 28オsec).
With a Rails 5 app, without Zeitwerk, the load path is searched iteratively looking for a file to define a constant, this causes something like 2,000,000 calls to `open(2)` of which 97.5% are failing with `ENOENT`.
I believe that the cost of two syscalls (`open(2)` only after successful `access(2)`) would, in our case, at least because we would shave-off something like 1,900,000ラ90オsec (2.85 minutes) from the three minute boot time for our application.
I prepared a very na阮e patch with a simple early-return in `rb_file_load_ok`:
```
diff --git a/file.c b/file.c
index 3bf092c05c..c7a7635125 100644
--- a/file.c
+++ b/file.c
@@ -5986,6 +5986,16 @@ rb_file_load_ok(const char *path)
O_NDELAY |
#endif
0);
+ if (access(path, R_OK) == -1) return 0;
int fd = rb_cloexec_open(path, mode, 0);
if (fd == -1) return 0;
rb_update_max_fd(fd);
```
This hasn't been exhaustively tested as I simply haven't had time yet, but at least it compiled and passed `make check`.
I spoke with Aaron Patterson on Twitter, who suggested maybe a wiser approach would be a heuristic approach one level higher (`rb_find_file`?) which switches the strategy based on the length of the LOAD_PATH.
Alternatively, maybe the patch could be conditional, guarded somehow, and conditionally compiled only into the Rubies built for Docker, in a way that is portable to the common Ruby version managers.
I am opening this ticket to track my own work, as much as anything, with no expectation that someone implement this on my behalf. I am eager to contribute to Ruby for all the benefit I have seen from it in my career.
If someone knows hints why this may be an unsuccessful adventure, I gratefully receive any and all feedback.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>