[#98645] [Ruby master Misc#16933] DevelopersMeeting20200618Japan — mame@...

Issue #16933 has been reported by mame (Yusuke Endoh).

14 messages 2020/06/04

[#98663] [Ruby master Bug#16936] `make check TESTS="-n !/Foo#method/"` not skipping the test case — jaruga@...

Issue #16936 has been reported by jaruga (Jun Aruga).

13 messages 2020/06/05

[#98772] [Ruby master Bug#16959] Weakmap has specs and third-party usage despite being a private API — headius@...

Issue #16959 has been reported by headius (Charles Nutter).

13 messages 2020/06/12

[#98826] [Ruby master Feature#16963] Remove English.rb from Ruby 2.8/3.0 — hsbt@...

Issue #16963 has been reported by hsbt (Hiroshi SHIBATA).

9 messages 2020/06/16

[#98920] [Ruby master Bug#16978] Ruby should not use realpath for __FILE__ — v.ondruch@...

Issue #16978 has been reported by vo.x (Vit Ondruch).

24 messages 2020/06/23

[#98947] [Ruby master Feature#16986] Anonymous Struct literal — ko1@...

Issue #16986 has been reported by ko1 (Koichi Sasada).

66 messages 2020/06/26

[#98964] [Ruby master Feature#16989] Sets: need ♥️ — marcandre-ruby-core@...

Issue #16989 has been reported by marcandre (Marc-Andre Lafortune).

33 messages 2020/06/26

[#98965] [Ruby master Feature#16990] Sets: operators compatibility with Array — marcandre-ruby-core@...

Issue #16990 has been reported by marcandre (Marc-Andre Lafortune).

11 messages 2020/06/26

[#98968] [Ruby master Feature#16993] Sets: from hash keys using Hash#key_set — marcandre-ruby-core@...

Issue #16993 has been reported by marcandre (Marc-Andre Lafortune).

10 messages 2020/06/26

[#98997] [Ruby master Feature#17000] 2.7.2 turns off deprecation warnings by deafult — mame@...

Issue #17000 has been reported by mame (Yusuke Endoh).

16 messages 2020/06/30

[ruby-core:98999] [Ruby master Feature#17001] [Feature] Dir.scan to yield dirent for efficient and composable recursive directory scaning

From: jean.boussier@...
Date: 2020-06-30 19:57:25 UTC
List: ruby-core #98999
Issue #17001 has been reported by byroot (Jean Boussier).

----------------------------------------
Feature #17001: [Feature] Dir.scan to yield dirent for efficient and composable recursive directory scaning
https://bugs.ruby-lang.org/issues/17001

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
### Use case

When you need to recusrsively scan a directory, you either have to use `Dir[]` / `Dir.glob`, which is fine for small directories or simple patterns,
but can easily take several seconds to complete for large repositories or complex patterns and returns a very large array which tend to trash GC.

Or you can use `Dir.each_entry` / `Dir.foreach` recursively, but then you need to `stat` each entry to know wether it's a directory, or even symlink if you want to follow them.
This means one syscall per directory, and one per file and directories. This is particularly impactful on OSX where `stat()` is several times slower than on Linux because of various sandboxing features.

There's a [typical example of this use case in Bootsnap](https://github.com/Shopify/bootsnap/blob/56c61373000573112ee027dae4be19aecd50e46e/lib/bootsnap/load_path_cache/path_scanner.rb).

### Proposal

[Python introduced `os.scandir` a few years ago](https://www.python.org/dev/peps/pep-0471/) for exactly this purpose. It is functionaly similar to `Dir.foreach` / `Dir.each_child`, except it yields
`DirEntry` instances which are a wrapper around the `libc` `dirent` struct.

I reduced the Bootsnap code into a [simplified benchmark](https://gist.github.com/casperisfine/2124f349c6564560df4399f2eadaa8f2), and using `os.scandir()` Python scan our main repo in a bit over `1s`, which 3 to 4 times faster
than Ruby can with `Dir.foreach` (`3-4s`). For comparison sake `Dir['**/*.rb']` also complete in about `1s`.

So I beleive that exposing a similar `Dir.scan` method, returning `Dir::Entry` instances, with methods inspired from `File::Stat` such as `directory?` would allow for more performant file system scaning
when the query is not easily expressed with a glob pattern.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next