From: Urabe Shyouhei Date: 2016-06-07T00:28:06+09:00 Subject: [ruby-core:75863] Re: [Ruby trunk Feature#12463] ruby lacks plus-plus On 06/06/2016 10:44 PM, Eric Wong wrote: >> Also for a nontrivial benchmark, optcarrot speeds up from 31.14 fps to >> 31.98 fps (102.6%). This is not that huge gain (disappointed), but it >> does speeds up more or less. > > Yes, that is small. What is optcarrot? I can't find it in trunk. Sorry about it. Optcarrot is a recently-developed pure-ruby NES simulator. Can be fond here: git@github.com:mame/optcarrot.git > Also, what is performance impact for other benchmarks? Attached. consult below for full benchmark output. I have not (and have to) taken a closer look at each results but at a glance no benchmark gained 200% speed up, nor 50% speed down. To put it better it has no significant drawbacks. > Do you notice a performance change (either micro or other > benchmarks) removing "inline"? No, not tested yet. I'll take a look. > Mainly, I am uncomfortable about making vm_exec loop bigger and > blowing away icache. I understand your concern. I have to point out on the other hand that this optimization shortens 4 VM instructions into 1, which must positively impact dcache locality a bit (instruction sequence is very frequently accessed). There is a tradeoff between them. ``` ----------------------------------------------------------- raw data: [["app_answer", [[0.052123], [0.045075]]], ["app_aobench", [[51.100811], [61.442784]]], ["app_erb", [[0], [0]]], ["app_factorial", [[1.017833], [0.981952]]], ["app_fib", [[0.483506], [0.493121]]], ["app_lc_fizzbuzz", [[86.574658], [83.14404]]], ["app_mandelbrot", [[1.407061], [1.315739]]], ["app_pentomino", [[16.77479], [16.531438]]], ["app_raise", [[0.269551], [0.278264]]], ["app_strconcat", [[0.961209], [0.924087]]], ["app_tak", [[0.607776], [0.603985]]], ["app_tarai", [[0.504795], [0.504447]]], ["app_uri", [[0], [0]]], ["array_shift", [[0], [0]]], ["hash_aref_dsym", [[0.365131], [0.355265]]], ["hash_aref_dsym_long", [[12.73269], [13.330662]]], ["hash_aref_fix", [[0.39919], [0.352738]]], ["hash_aref_flo", [[0.075359], [0.090797]]], ["hash_aref_miss", [[0.517686], [0.529451]]], ["hash_aref_str", [[0.466861], [0.421541]]], ["hash_aref_sym", [[0.359693], [0.339002]]], ["hash_aref_sym_long", [[0.521086], [0.537503]]], ["hash_flatten", [[0.350155], [0.356782]]], ["hash_ident_flo", [[0.047239], [0.058664]]], ["hash_ident_num", [[0.312423], [0.331413]]], ["hash_ident_obj", [[0.343969], [0.30839]]], ["hash_ident_str", [[0.3327], [0.337857]]], ["hash_ident_sym", [[0.366507], [0.345687]]], ["hash_keys", [[0.356622], [0.354485]]], ["hash_shift", [[0.028791], [0.044413]]], ["hash_shift_u16", [[0.147648], [0.137591]]], ["hash_shift_u24", [[0.135581], [0.140135]]], ["hash_shift_u32", [[0.133544], [0.149953]]], ["hash_to_proc", [[0.015097], [0.015503]]], ["hash_values", [[0.326313], [0.338462]]], ["io_file_create", [[2.631183], [2.29139]]], ["io_file_read", [[0], [0]]], ["io_file_write", [[0], [0]]], ["io_nonblock_noex", [[0], [0]]], ["io_nonblock_noex2", [[0], [0]]], ["io_select", [[2.871379], [2.845252]]], ["io_select2", [[3.019507], [2.971023]]], ["io_select3", [[0.018271], [0.018546]]], ["loop_for", [[1.360109], [1.433324]]], ["loop_generator", [[0.644188], [0.651241]]], ["loop_times", [[1.260078], [1.259863]]], ["loop_whileloop", [[0.635374], [0.355271]]], ["loop_whileloop2", [[0.141063], [0.076761]]], ["marshal_dump_flo", [[0.523541], [0.498045]]], ["marshal_dump_load_geniv", [[0.89369], [0.862783]]], ["marshal_dump_load_time", [[1.238441], [1.180119]]], ["require", [[16.165019], [2.615595]]], ["require_thread", [[0.560172], [0.548627]]], ["securerandom", [[0], [0]]], ["so_ackermann", [[0.634471], [0.66092]]], ["so_array", [[1.073883], [1.068649]]], ["so_binary_trees", [[7.228263], [7.137076]]], ["so_concatenate", [[5.055297], [4.822921]]], ["so_count_words", [[0.216728], [0.173223]]], ["so_exception", [[0.348299], [0.312874]]], ["so_fannkuch", [[1.802251], [1.80874]]], ["so_fasta", [[1.57365], [1.562748]]], ["so_k_nucleotide", [[1.186867], [1.174061]]], ["so_lists", [[0.522591], [0.493595]]], ["so_mandelbrot", [[2.446802], [2.483779]]], ["so_matrix", [[0.553252], [0.537407]]], ["so_meteor_contest", [[3.098925], [3.09681]]], ["so_nbody", [[1.351588], [1.303507]]], ["so_nested_loop", [[1.025351], [1.060477]]], ["so_nsieve", [[1.84633], [1.748345]]], ["so_nsieve_bits", [[2.130051], [2.132071]]], ["so_object", [[0.637309], [0.646502]]], ["so_partial_sums", [[1.856853], [1.748393]]], ["so_pidigits", [[1.23942], [1.301111]]], ["so_random", [[0.39408], [0.35076]]], ["so_reverse_complement", [[1.571702], [1.644563]]], ["so_sieve", [[0.547596], [0.496421]]], ["so_spectralnorm", [[1.870886], [1.850356]]], ["vm1_attr_ivar", [[1.218386], [1.012984]]], ["vm1_attr_ivar_set", [[1.337383], [1.103195]]], ["vm1_block", [[1.996893], [1.86192]]], ["vm1_const", [[0.871864], [0.738432]]], ["vm1_ensure", [[0.633504], [0.405527]]], ["vm1_float_simple", [[4.575614], [4.468849]]], ["vm1_gc_short_lived", [[5.659463], [5.823397]]], ["vm1_gc_short_with_complex_long", [[6.430288], [6.456603]]], ["vm1_gc_short_with_long", [[7.00017], [6.849872]]], ["vm1_gc_short_with_symbol", [[5.526827], [5.366768]]], ["vm1_gc_wb_ary", [[1.148627], [0.95045]]], ["vm1_gc_wb_ary_promoted", [[1.177966], [0.945101]]], ["vm1_gc_wb_obj", [[1.024596], [0.767215]]], ["vm1_gc_wb_obj_promoted", [[1.150204], [0.942514]]], ["vm1_ivar", [[0.891796], [0.615384]]], ["vm1_ivar_set", [[0.872773], [0.657297]]], ["vm1_length", [[1.14291], [0.865646]]], ["vm1_lvar_init", [[1.906298], [1.72483]]], ["vm1_lvar_set", [[2.984877], [2.637209]]], ["vm1_neq", [[1.163216], [0.960243]]], ["vm1_not", [[0.949804], [0.650484]]], ["vm1_rescue", [[0.778971], [0.464327]]], ["vm1_simplereturn", [[1.238101], [0.982779]]], ["vm1_swap", [[0.928376], [0.674591]]], ["vm1_yield", [[1.348768], [1.099841]]], ["vm2_array", [[1.398596], [1.350792]]], ["vm2_bigarray", [[9.668123], [9.524274]]], ["vm2_bighash", [[7.155718], [7.151162]]], ["vm2_case", [[0.223288], [0.209747]]], ["vm2_case_lit", [[0.820809], [0.797681]]], ["vm2_defined_method", [[2.721788], [2.654055]]], ["vm2_dstr", [[1.09328], [1.042475]]], ["vm2_eval", [[31.762698], [32.069769]]], ["vm2_method", [[1.154035], [1.077522]]], ["vm2_method_missing", [[2.697381], [2.566442]]], ["vm2_method_with_block", [[1.432468], [1.219831]]], ["vm2_mutex", [[0.797908], [0.761003]]], ["vm2_newlambda", [[1.559347], [1.46937]]], ["vm2_poly_method", [[2.61575], [2.589814]]], ["vm2_poly_method_ov", [[0.295096], [0.23947]]], ["vm2_proc", [[0.570004], [0.53438]]], ["vm2_raise1", [[5.852356], [5.784271]]], ["vm2_raise2", [[8.396749], [8.350987]]], ["vm2_regexp", [[1.203123], [1.170023]]], ["vm2_send", [[0.469301], [0.420171]]], ["vm2_string_literal", [[0.344021], [0.263268]]], ["vm2_struct_big_aref_hi", [[0.284613], [0.240975]]], ["vm2_struct_big_aref_lo", [[0.289707], [0.238292]]], ["vm2_struct_big_aset", [[0.321701], [0.280001]]], ["vm2_struct_big_href_hi", [[0.371876], [0.354926]]], ["vm2_struct_big_href_lo", [[0.410041], [0.359201]]], ["vm2_struct_big_hset", [[0.414453], [0.344909]]], ["vm2_struct_small_aref", [[0.228413], [0.185585]]], ["vm2_struct_small_aset", [[0.304598], [0.273653]]], ["vm2_struct_small_href", [[0.353266], [0.314363]]], ["vm2_struct_small_hset", [[0.348498], [0.328763]]], ["vm2_super", [[0.547899], [0.490276]]], ["vm2_unif1", [[0.256021], [0.222816]]], ["vm2_zsuper", [[0.540286], [0.523564]]], ["vm3_backtrace", [[0.211558], [0.211667]]], ["vm3_clearmethodcache", [[0.518466], [0.546614]]], ["vm3_gc", [[1.47503], [1.481742]]], ["vm3_gc_old_full", [[3.070784], [3.0723]]], ["vm3_gc_old_immediate", [[2.735763], [2.888246]]], ["vm3_gc_old_lazy", [[4.444772], [3.881177]]], ["vm_symbol_block_pass", [[0.954599], [0.967853]]], ["vm_thread_alive_check1", [[0.217786], [0.20758]]], ["vm_thread_close", [[3.568386], [3.668023]]], ["vm_thread_create_join", [[2.599036], [2.579275]]], ["vm_thread_mutex1", [[0.620894], [0.573859]]], ["vm_thread_mutex2", [[0.954779], [0.915577]]], ["vm_thread_mutex3", [[165.541376], [159.570138]]], ["vm_thread_pass", [[0.691981], [0.763412]]], ["vm_thread_pass_flood", [[0.08634], [0.087652]]], ["vm_thread_pipe", [[0.372563], [0.383598]]], ["vm_thread_queue", [[0.127318], [0.11117]]]] Elapsed time: 1162.336444 (sec) ----------------------------------------------------------- benchmark results: Execution time (sec) name trunk ours app_answer 0.052 0.045 app_aobench 51.101 61.443 app_erb 0.000 0.000 app_factorial 1.018 0.982 app_fib 0.484 0.493 app_lc_fizzbuzz 86.575 83.144 app_mandelbrot 1.407 1.316 app_pentomino 16.775 16.531 app_raise 0.270 0.278 app_strconcat 0.961 0.924 app_tak 0.608 0.604 app_tarai 0.505 0.504 app_uri 0.000 0.000 array_shift 0.000 0.000 hash_aref_dsym 0.365 0.355 hash_aref_dsym_long 12.733 13.331 hash_aref_fix 0.399 0.353 hash_aref_flo 0.075 0.091 hash_aref_miss 0.518 0.529 hash_aref_str 0.467 0.422 hash_aref_sym 0.360 0.339 hash_aref_sym_long 0.521 0.538 hash_flatten 0.350 0.357 hash_ident_flo 0.047 0.059 hash_ident_num 0.312 0.331 hash_ident_obj 0.344 0.308 hash_ident_str 0.333 0.338 hash_ident_sym 0.367 0.346 hash_keys 0.357 0.354 hash_shift 0.029 0.044 hash_shift_u16 0.148 0.138 hash_shift_u24 0.136 0.140 hash_shift_u32 0.134 0.150 hash_to_proc 0.015 0.016 hash_values 0.326 0.338 io_file_create 2.631 2.291 io_file_read 0.000 0.000 io_file_write 0.000 0.000 io_nonblock_noex 0.000 0.000 io_nonblock_noex2 0.000 0.000 io_select 2.871 2.845 io_select2 3.020 2.971 io_select3 0.018 0.019 loop_for 1.360 1.433 loop_generator 0.644 0.651 loop_times 1.260 1.260 loop_whileloop 0.635 0.355 loop_whileloop2 0.141 0.077 marshal_dump_flo 0.524 0.498 marshal_dump_load_geniv 0.894 0.863 marshal_dump_load_time 1.238 1.180 require 16.165 2.616 require_thread 0.560 0.549 securerandom 0.000 0.000 so_ackermann 0.634 0.661 so_array 1.074 1.069 so_binary_trees 7.228 7.137 so_concatenate 5.055 4.823 so_count_words 0.217 0.173 so_exception 0.348 0.313 so_fannkuch 1.802 1.809 so_fasta 1.574 1.563 so_k_nucleotide 1.187 1.174 so_lists 0.523 0.494 so_mandelbrot 2.447 2.484 so_matrix 0.553 0.537 so_meteor_contest 3.099 3.097 so_nbody 1.352 1.304 so_nested_loop 1.025 1.060 so_nsieve 1.846 1.748 so_nsieve_bits 2.130 2.132 so_object 0.637 0.647 so_partial_sums 1.857 1.748 so_pidigits 1.239 1.301 so_random 0.394 0.351 so_reverse_complement 1.572 1.645 so_sieve 0.548 0.496 so_spectralnorm 1.871 1.850 vm1_attr_ivar* 0.583 0.658 vm1_attr_ivar_set* 0.702 0.748 vm1_block* 1.362 1.507 vm1_const* 0.236 0.383 vm1_ensure* 0.000 0.050 vm1_float_simple* 3.940 4.114 vm1_gc_short_lived* 5.024 5.468 vm1_gc_short_with_complex_long* 5.795 6.101 vm1_gc_short_with_long* 6.365 6.495 vm1_gc_short_with_symbol* 4.891 5.011 vm1_gc_wb_ary* 0.513 0.595 vm1_gc_wb_ary_promoted* 0.543 0.590 vm1_gc_wb_obj* 0.389 0.412 vm1_gc_wb_obj_promoted* 0.515 0.587 vm1_ivar* 0.256 0.260 vm1_ivar_set* 0.237 0.302 vm1_length* 0.508 0.510 vm1_lvar_init* 1.271 1.370 vm1_lvar_set* 2.350 2.282 vm1_neq* 0.528 0.605 vm1_not* 0.314 0.295 vm1_rescue* 0.144 0.109 vm1_simplereturn* 0.603 0.628 vm1_swap* 0.293 0.319 vm1_yield* 0.713 0.745 vm2_array* 1.258 1.274 vm2_bigarray* 9.527 9.448 vm2_bighash* 7.015 7.074 vm2_case* 0.082 0.133 vm2_case_lit* 0.680 0.721 vm2_defined_method* 2.581 2.577 vm2_dstr* 0.952 0.966 vm2_eval* 31.622 31.993 vm2_method* 1.013 1.001 vm2_method_missing* 2.556 2.490 vm2_method_with_block* 1.291 1.143 vm2_mutex* 0.657 0.684 vm2_newlambda* 1.418 1.393 vm2_poly_method* 2.475 2.513 vm2_poly_method_ov* 0.154 0.163 vm2_proc* 0.429 0.458 vm2_raise1* 5.711 5.708 vm2_raise2* 8.256 8.274 vm2_regexp* 1.062 1.093 vm2_send* 0.328 0.343 vm2_string_literal* 0.203 0.187 vm2_struct_big_aref_hi* 0.144 0.164 vm2_struct_big_aref_lo* 0.149 0.162 vm2_struct_big_aset* 0.181 0.203 vm2_struct_big_href_hi* 0.231 0.278 vm2_struct_big_href_lo* 0.269 0.282 vm2_struct_big_hset* 0.273 0.268 vm2_struct_small_aref* 0.087 0.109 vm2_struct_small_aset* 0.164 0.197 vm2_struct_small_href* 0.212 0.238 vm2_struct_small_hset* 0.207 0.252 vm2_super* 0.407 0.414 vm2_unif1* 0.115 0.146 vm2_zsuper* 0.399 0.447 vm3_backtrace 0.212 0.212 vm3_clearmethodcache 0.518 0.547 vm3_gc 1.475 1.482 vm3_gc_old_full 3.071 3.072 vm3_gc_old_immediate 2.736 2.888 vm3_gc_old_lazy 4.445 3.881 vm_symbol_block_pass 0.955 0.968 vm_thread_alive_check1 0.218 0.208 vm_thread_close 3.568 3.668 vm_thread_create_join 2.599 2.579 vm_thread_mutex1 0.621 0.574 vm_thread_mutex2 0.955 0.916 vm_thread_mutex3 165.541 159.570 vm_thread_pass 0.692 0.763 vm_thread_pass_flood 0.086 0.088 vm_thread_pipe 0.373 0.384 vm_thread_queue 0.127 0.111 Speedup ratio: compare with the result of `trunk' (greater is better) name ours app_answer 1.156 app_aobench 0.832 app_erbError app_factorial 1.037 app_fib 0.981 app_lc_fizzbuzz 1.041 app_mandelbrot 1.069 app_pentomino 1.015 app_raise 0.969 app_strconcat 1.040 app_tak 1.006 app_tarai 1.001 app_uriError array_shiftError hash_aref_dsym 1.028 hash_aref_dsym_long 0.955 hash_aref_fix 1.132 hash_aref_flo 0.830 hash_aref_miss 0.978 hash_aref_str 1.108 hash_aref_sym 1.061 hash_aref_sym_long 0.969 hash_flatten 0.981 hash_ident_flo 0.805 hash_ident_num 0.943 hash_ident_obj 1.115 hash_ident_str 0.985 hash_ident_sym 1.060 hash_keys 1.006 hash_shift 0.648 hash_shift_u16 1.073 hash_shift_u24 0.968 hash_shift_u32 0.891 hash_to_proc 0.974 hash_values 0.964 io_file_create 1.148 io_file_readError io_file_writeError io_nonblock_noexError io_nonblock_noex2Error io_select 1.009 io_select2 1.016 io_select3 0.985 loop_for 0.949 loop_generator 0.989 loop_times 1.000 loop_whileloop 1.788 loop_whileloop2 1.838 marshal_dump_flo 1.051 marshal_dump_load_geniv 1.036 marshal_dump_load_time 1.049 require 6.180 require_thread 1.021 securerandomError so_ackermann 0.960 so_array 1.005 so_binary_trees 1.013 so_concatenate 1.048 so_count_words 1.251 so_exception 1.113 so_fannkuch 0.996 so_fasta 1.007 so_k_nucleotide 1.011 so_lists 1.059 so_mandelbrot 0.985 so_matrix 1.029 so_meteor_contest 1.001 so_nbody 1.037 so_nested_loop 0.967 so_nsieve 1.056 so_nsieve_bits 0.999 so_object 0.986 so_partial_sums 1.062 so_pidigits 0.953 so_random 1.124 so_reverse_complement 0.956 so_sieve 1.103 so_spectralnorm 1.011 vm1_attr_ivar* 0.886 vm1_attr_ivar_set* 0.939 vm1_block* 0.904 vm1_const* 0.617 vm1_ensure* 0.000 vm1_float_simple* 0.958 vm1_gc_short_lived* 0.919 vm1_gc_short_with_complex_long* 0.950 vm1_gc_short_with_long* 0.980 vm1_gc_short_with_symbol* 0.976 vm1_gc_wb_ary* 0.862 vm1_gc_wb_ary_promoted* 0.920 vm1_gc_wb_obj* 0.945 vm1_gc_wb_obj_promoted* 0.877 vm1_ivar* 0.986 vm1_ivar_set* 0.786 vm1_length* 0.994 vm1_lvar_init* 0.928 vm1_lvar_set* 1.030 vm1_neq* 0.873 vm1_not* 1.065 vm1_rescue* 1.317 vm1_simplereturn* 0.961 vm1_swap* 0.918 vm1_yield* 0.958 vm2_array* 0.987 vm2_bigarray* 1.008 vm2_bighash* 0.992 vm2_case* 0.618 vm2_case_lit* 0.943 vm2_defined_method* 1.001 vm2_dstr* 0.986 vm2_eval* 0.988 vm2_method* 1.012 vm2_method_missing* 1.027 vm2_method_with_block* 1.130 vm2_mutex* 0.960 vm2_newlambda* 1.018 vm2_poly_method* 0.985 vm2_poly_method_ov* 0.947 vm2_proc* 0.937 vm2_raise1* 1.001 vm2_raise2* 0.998 vm2_regexp* 0.971 vm2_send* 0.956 vm2_string_literal* 1.088 vm2_struct_big_aref_hi* 0.874 vm2_struct_big_aref_lo* 0.920 vm2_struct_big_aset* 0.889 vm2_struct_big_href_hi* 0.830 vm2_struct_big_href_lo* 0.952 vm2_struct_big_hset* 1.020 vm2_struct_small_aref* 0.803 vm2_struct_small_aset* 0.831 vm2_struct_small_href* 0.893 vm2_struct_small_hset* 0.823 vm2_super* 0.984 vm2_unif1* 0.787 vm2_zsuper* 0.894 vm3_backtrace 0.999 vm3_clearmethodcache 0.949 vm3_gc 0.995 vm3_gc_old_full 1.000 vm3_gc_old_immediate 0.947 vm3_gc_old_lazy 1.145 vm_symbol_block_pass 0.986 vm_thread_alive_check1 1.049 vm_thread_close 0.973 vm_thread_create_join 1.008 vm_thread_mutex1 1.082 vm_thread_mutex2 1.043 vm_thread_mutex3 1.037 vm_thread_pass 0.906 vm_thread_pass_flood 0.985 vm_thread_pipe 0.971 vm_thread_queue 1.145 ``` Unsubscribe: