From: tomoyapenguin@... Date: 2017-01-18T13:25:25+00:00 Subject: [ruby-dev:49956] [Ruby trunk Bug#13136] large_array.sample(11)が遅い Issue #13136 has been reported by tomoya ishida. ---------------------------------------- Bug #13136: large_array.sample(11)が遅い https://bugs.ruby-lang.org/issues/13136 * Author: tomoya ishida * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-darwin16] * Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- Array#sampleのパフォーマンスを改善したい ```ruby require 'benchmark' arr = 100000.times.to_a; Benchmark.measure{100000.times{arr.sample 10}}.real #=> 0.07052100007422268 速い! Benchmark.measure{100000.times{arr.sample 11}}.real #=> 37.8880900000222 遅い!!! ``` 現状のArray#sample(n) ``` n<=3 → 個別場合分け 310 → dupしてshuffle O(size) 巨大配列から少数sampleする時に遅い ``` アルゴリズム改善案について ```ruby # n>10の時の現状のアルゴリズム O(size) def sample arr, n arr = arr.dup n.times do |i| j = rand(arr.size - i) + i arr[i], arr[j] = arr[j], arr[i] end arr.take n end # dupをやめ、配列への代入を減らす(ただし元配列を破壊してしまう) O(n) def sample arr, n n.times.map do |i| j = rand(arr.size - i) + i val = arr[j] arr[j] = arr[i] val end end # 配列を書き換える代わりに、どこを入れ替えたかhashに保存する(元配列を破壊しない) O(n) def sample arr, n replace_table = {} n.times.map do |i| j = rand(arr.size - i) + i j2 = replace_table[j] || j replace_table[j] = replace_table[i] || i arr[j2] end end ``` patch作りました ベンチマーク ``` arr = 100000.times.to_a; def measure;t=Time.now;yield;Time.now-t;end p measure{10000.times{arr.sample 10}} p measure{10000.times{arr.sample 11}} p measure{10000.times{arr.sample 100}} p measure{10000.times{arr.sample 1000}} ``` ruby 2.4.0p0 ``` 0.008185 3.938569 4.043285 4.142051 ``` this patch ``` 0.010408 0.016441 0.08861 0.684844 ``` ---Files-------------------------------- rb_ary_sample.patch (1.42 KB) -- https://bugs.ruby-lang.org/