[#97086] [Ruby master Bug#16612] Enumerator::ArithmeticSequence#last with float range produces incorrect value — muraken@...
Issue #16612 has been updated by mrkn (Kenta Murata).
4 messages
2020/02/07
[#97307] [Ruby master Feature#16663] Add block or filtered forms of Kernel#caller to allow early bail-out — headius@...
Issue #16663 has been reported by headius (Charles Nutter).
29 messages
2020/02/28
[ruby-core:97314] [Ruby master Feature#16648] improve GC performance by 5% with builtin_prefetch
From:
bobbypowers@...
Date:
2020-02-29 18:27:27 UTC
List:
ruby-core #97314
Issue #16648 has been updated by bpowers (Bobby Powers). alanwu (Alan Wu) wrote in #note-1: > I ran the patch on some included GC benchmarks in the repo and it doesn't seem to be a pure win (built-ruby is the patched version): Thanks! I hadn't seen these. I see roughly similar results locally on these benchmarks; I'll dig in to see if I can understand whats happening. ---------------------------------------- Feature #16648: improve GC performance by 5% with builtin_prefetch https://bugs.ruby-lang.org/issues/16648#change-84439 * Author: bpowers (Bobby Powers) * Status: Open * Priority: Normal ---------------------------------------- The mark phase of non-incremental major GC is (I believe) dominated by pointer chasing. One way we can improve that is by prefetching cachelines from memory before they are accessed, to reduce stalls. I did some experimenting, and the following patch reduces the time spent on a full GC from ~ 950 milliseconds to ~ 900 milliseconds, a small but stable improvement. I would love if additional folks have other benchmarks (or could point me at them) to see if this holds up beyond the web service I tested, and whether something like this could be considered for inclusion. I also attempted a more "principled" approach based on an optimization described in the GC handbook: putting a FIFO queue in front of the mark stack, and prefetching addresses as they enter the queue. However, I wasn't able to see any performance improvement there despite testing a number of queue sizes from 4 to 64. Its possible I implemented this wrong, or misjudged the access patterns (if e.g. the memory of a VALUE is accessed before push_mark_stack is called, it would invalidate this approach). The code for that alternative is here: https://github.com/bpowers/ruby/commit/d790d0c15047c36c23850a112093fe0e32fd3262 ---Files-------------------------------- 0001-gc-prefech-objects-seems-to-improve-full-GC-performa.patch (2.29 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>