From: ko1@... Date: 2015-12-04T03:40:57+00:00 Subject: [ruby-core:71827] [Ruby trunk - Feature #11768] Add a polymorphic inline cache Issue #11768 has been updated by Koichi Sasada. Thank you for your patch. Comments: Implementation: * Did you run some big application with your patch? it seems to have a bug. cc->aux also depends on cc->call, so that you need to duplicate the aux field. * Sliding is one good idea (FIFO strategy) for a few entries. Did you try ring with a counter? LRU is another idea (but not easy). Evaluation: * Ideally, results of vm2_method and vm2_poly_method should be same. What is the difference? * please check the following counts. * the number of mono-calls and poly-calls on some applications. we can understand the ratio. If the ratio is high, then it will be valuable. If it is not so high, we need to consider variable length of PIC entries (grow from 1 entry). * count ideal cache entries number for each poly-calls. List method bodies for each calls. With this counts, we can estimate the valuable number of cache entries. * how memory consumption grows? * on my measurement on small rails application with valgrind/massif, 1/3 of memory is used by iseq related data and 1/3 of iseq data is consumed by call caches. If iseq consumes 30MB, then 10MB is consumed by call caches. with this patch, 60MB can be consumed by PIC entries. ---------------------------------------- Feature #11768: Add a polymorphic inline cache https://bugs.ruby-lang.org/issues/11768#change-55231 * Author: Aaron Patterson * Status: Open * Priority: Normal * Assignee: Koichi Sasada ---------------------------------------- Hi, I've attached a patch that adds a PIC to the existing Mono IC struct. I haven't run every benchmark that's checked in, but this patch speeds up the polymorphic call benchmark by about 20%. Here is the benchmark *before* my patch: ~~~ [aaron@TC ruby (trunk)]$ time ./ruby benchmark/bm_vm2_poly_method.rb real 0m3.244s user 0m3.154s sys 0m0.044s [aaron@TC ruby (trunk)]$ time ./ruby benchmark/bm_vm2_poly_method.rb real 0m3.158s user 0m3.090s sys 0m0.042s [aaron@TC ruby (trunk)]$ time ./ruby benchmark/bm_vm2_poly_method.rb real 0m3.162s user 0m3.099s sys 0m0.039s ~~~ Here it is with my patch applied: ~~~ [aaron@TC ruby (pic2)]$ time ./ruby benchmark/bm_vm2_poly_method.rb real 0m2.522s user 0m2.455s sys 0m0.044s [aaron@TC ruby (pic2)]$ time ./ruby benchmark/bm_vm2_poly_method.rb real 0m2.515s user 0m2.458s sys 0m0.035s [aaron@TC ruby (pic2)]$ time ./ruby benchmark/bm_vm2_poly_method.rb real 0m2.637s user 0m2.545s sys 0m0.045s ~~~ Monomorhic call sites maintain the same performance: Before: ~~~ [aaron@TC ruby (trunk)]$ time ./ruby benchmark/bm_vm2_method.rb real 0m1.416s user 0m1.371s sys 0m0.032s [aaron@TC ruby (trunk)]$ time ./ruby benchmark/bm_vm2_method.rb real 0m1.456s user 0m1.402s sys 0m0.032s [aaron@TC ruby (trunk)]$ time ./ruby benchmark/bm_vm2_method.rb real 0m1.420s user 0m1.372s sys 0m0.032s ~~~ After: ~~~ [aaron@TC ruby (pic2)]$ time ./ruby benchmark/bm_vm2_method.rb real 0m1.451s user 0m1.399s sys 0m0.033s [aaron@TC ruby (pic2)]$ time ./ruby benchmark/bm_vm2_method.rb real 0m1.494s user 0m1.438s sys 0m0.033s [aaron@TC ruby (pic2)]$ time ./ruby benchmark/bm_vm2_method.rb real 0m1.466s user 0m1.416s sys 0m0.032s ~~~ The down side of this patch is that it increases memory usage because the size of the call cache struct gets larger, even if the call site is monomorphic. I think we could make the code expand and contract, but I'm not sure if it's worthwhile. The other downside is that it will probably slow down calls if the global method state changes, but I don't think that is a situation we should optimize for. I've actually attached 2 patches, one adds the PIC, the other adds a tracepoint so that I could log cache hit / miss rates. ---Files-------------------------------- 0001-add-PIC.patch (2.9 KB) 0002-add-a-tracepoint-for-PIC-hit-miss.patch (4.33 KB) -- https://bugs.ruby-lang.org/