[#85940] [Ruby trunk Bug#14578] Forking a child process inside of a mutex crashes the ruby interpreter — ben.govero@...
Issue #14578 has been reported by bengovero (Ben Govero).
3 messages
2018/03/05
[#86205] [Ruby trunk Feature#14618] Add display width method to String for CLI — aycabta@...
Issue #14618 has been reported by aycabta (aycabta .).
3 messages
2018/03/19
[#86366] Re: [ruby-cvs:70102] usa:r63008 (trunk): get rid of test error/failure on Windows introduced at r62955 — Eric Wong <normalperson@...>
usa@ruby-lang.org wrote:
3 messages
2018/03/28
[ruby-core:86107] [Ruby trunk Feature#4017] [PATCH] CSV parsing speedup
From:
knowide@...
Date:
2018-03-14 03:57:00 UTC
List:
ruby-core #86107
Issue #4017 has been updated by tomog105 (Tomohiro Ogoke).
I tried to create a benchmark script for `CSV.parse`
(Using `benchmark-ips` gem)
## Script
```
# benchmark script for CSV.parse
# Usage: `ruby $0 [rows count(default: 1000)]`
require 'csv'
require 'benchmark/ips'
Benchmark.ips do |x|
rows = ARGV.fetch(0, "1000").to_i
alphas = ['AAAAA'] * 50
unquoted = (alphas.join(',') + "\r\n") * rows
quoted = (alphas.map { |s| %("#{s}") }.join(',') + "\r\n") * rows
inc_col_sep = (alphas.map { |s| %(",#{s}") }.join(',') + "\r\n") * rows
inc_row_sep = (alphas.map { |s| %("#{s}\r\n") }.join(',') + "\r\n") * rows
hiraganas = ['あああああ'] * 50
enc_utf8 = (hiraganas.join(',') + "\r\n") * rows
enc_sjis = enc_utf8.encode('Windows-31J')
x.report("unquoted") { CSV.parse(unquoted) }
x.report("quoted") { CSV.parse(quoted) }
x.report("include col_sep") { CSV.parse(inc_col_sep) }
x.report("include row_sep") { CSV.parse(inc_row_sep) }
x.report("encode utf-8") { CSV.parse(enc_utf8) }
x.report("encode sjis") { CSV.parse(enc_sjis) }
end
```
## Result
- Ruby version: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin16]
- Processor: Intel Core i7 6700K @ 4 GHz
- Memory: 16 GB
- Target revision: https://github.com/ruby/csv/commit/583cee457773c0ac9e7234b885e1a82df35a47cb
### Rows: 1000
```
unquoted 41.142 (± 0.0%) i/s - 208.000 in 5.055839s
quoted 23.093 (± 0.0%) i/s - 116.000 in 5.024081s
include col_sep 14.826 (± 0.0%) i/s - 75.000 in 5.059138s
include row_sep 7.136 (± 0.0%) i/s - 36.000 in 5.045395s
encode utf-8 34.350 (± 0.0%) i/s - 174.000 in 5.066178s
encode sjis 34.230 (± 0.0%) i/s - 174.000 in 5.083444s
```
### Rows: 10000
```console
unquoted 4.021 (± 0.0%) i/s - 21.000 in 5.230854s
quoted 2.266 (± 0.0%) i/s - 12.000 in 5.327204s
include col_sep 1.527 (± 0.0%) i/s - 8.000 in 5.242055s
include row_sep 0.692 (± 0.0%) i/s - 4.000 in 5.780656s
encode utf-8 3.215 (± 0.0%) i/s - 16.000 in 5.010681s
encode sjis 3.400 (± 0.0%) i/s - 17.000 in 5.012165s
```
----------------------------------------
Feature #4017: [PATCH] CSV parsing speedup
https://bugs.ruby-lang.org/issues/4017#change-70976
* Author: ender672 (Timothy Elliott)
* Status: Feedback
* Priority: Normal
* Assignee: kou (Kouhei Sutou)
* Target version:
----------------------------------------
=begin
ruby_19_csv_parser_split_methods.patch
This patch breaks the CSV parser into multiple methods that are easier to understand and it allows for the performance optimizations in the second patch. It removes all regular expressions from the parser, resulting in a ~25% speed improvement in the CSV test suite. It adds a new CSV parser option, :io_read_limit, which determines the max size for IO reads. This option defaults to 2048 which to was the fastest in my benchmarks.
ruby_19_csv_parser_split_methods.patch
This patch adds two shortcuts to the patch above that significantly improve parsing of CSV files that have many quoted columns. It has to be applied on top of the first patch.
On large CSV files I observed that these patches resulted in a 20% - 60% reduction of time it takes to parse. If this patchset looks good, I would like to experiment with further improvements that take advantage of io_read_limit to always read from IO in large chunks (right now it only does so with CSV files that have no quote characters).
These patches maintain m17n support and multi-character separator support (and boy, it's tough to make those tests happy :)
=end
---Files--------------------------------
ruby_19_csv_parser_split_methods.patch (11.9 KB)
ruby_19_csv_parser_speedup.patch (1.82 KB)
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>