[#105544] [Ruby master Feature#18239] Variable Width Allocation: Strings — "peterzhu2118 (Peter Zhu)" <noreply@...>

Issue #18239 has been reported by peterzhu2118 (Peter Zhu).

18 messages 2021/10/04

[#105566] [Ruby master Bug#18242] Parser makes multiple assignment sad in confusing way — "danh337 (Dan Higgins)" <noreply@...>

Issue #18242 has been reported by danh337 (Dan Higgins).

9 messages 2021/10/06

[#105573] [Ruby master Bug#18243] Ractor.make_shareable does not freeze the receiver of a Proc but allows accessing ivars of it — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18243 has been reported by Eregon (Benoit Daloze).

11 messages 2021/10/06

[#105618] [Ruby master Bug#18249] The ABI version of dev builds of CRuby does not correspond to the ABI — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18249 has been reported by Eregon (Benoit Daloze).

23 messages 2021/10/11

[#105626] [Ruby master Bug#18250] Anonymous variables seem to break `Ractor.make_shareable` — "tenderlovemaking (Aaron Patterson)" <noreply@...>

Issue #18250 has been reported by tenderlovemaking (Aaron Patterson).

14 messages 2021/10/12

[#105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 — "byroot (Jean Boussier)" <noreply@...>

Issue #18254 has been reported by byroot (Jean Boussier).

13 messages 2021/10/18

[#105672] [Ruby master Feature#18256] Change the canonical name of Thread::Mutex, Thread::Queue, Thread::SizedQueue and Thread::ConditionVariable to just Mutex, Queue, SizedQueue and ConditionVariable — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18256 has been reported by Eregon (Benoit Daloze).

6 messages 2021/10/19

[#105692] [Ruby master Bug#18257] SystemTap/DTrace coredump on ppc64le/s390x — "vo.x (Vit Ondruch)" <noreply@...>

Issue #18257 has been reported by vo.x (Vit Ondruch).

22 messages 2021/10/20

[#105781] [Ruby master Misc#18266] DevelopersMeeting20211118Japan — "mame (Yusuke Endoh)" <noreply@...>

Issue #18266 has been reported by mame (Yusuke Endoh).

13 messages 2021/10/25

[#105805] [Ruby master Bug#18270] Refinement#{extend_object, append_features, prepend_features} should be removed — "shugo (Shugo Maeda)" <noreply@...>

Issue #18270 has been reported by shugo (Shugo Maeda).

8 messages 2021/10/26

[#105826] [Ruby master Feature#18273] Class.subclasses — "byroot (Jean Boussier)" <noreply@...>

Issue #18273 has been reported by byroot (Jean Boussier).

35 messages 2021/10/27

[#105833] [Ruby master Feature#18275] Add an option to define_method to not capture the surrounding environment — "vinistock (Vinicius Stock)" <noreply@...>

Issue #18275 has been reported by vinistock (Vinicius Stock).

11 messages 2021/10/27

[#105853] [Ruby master Feature#18276] `Proc#bind_call(obj)` same as `obj.instance_exec(..., &proc_obj)` — "ko1 (Koichi Sasada)" <noreply@...>

Issue #18276 has been reported by ko1 (Koichi Sasada).

15 messages 2021/10/28

[ruby-core:105537] [Ruby master Bug#18238] CSV encoding issue with parsing from Zlib::GzipReader stream

From: "kou (Kouhei Sutou)" <noreply@...>
Date: 2021-10-04 08:43:09 UTC
List: ruby-core #105537
Issue #18238 has been updated by kou (Kouhei Sutou).

Status changed from Open to Third Party's Issue

Could you open this on https://github.com/ruby/rss ? ruby/rss is the upstream of csv.

----------------------------------------
Bug #18238: CSV encoding issue with parsing from Zlib::GzipReader stream
https://bugs.ruby-lang.org/issues/18238#change-93993

* Author: dim (Dimitrij Denissenko)
* Status: Third Party's Issue
* Priority: Normal
* ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Hi,

I found an issue with parsing CSVs directly from a `Zlib::GzipReader` IO which I am trying to debug. Unfortunately, I am not at liberty to share the (proprietary) CSV file and I couldn't recreate the issue with a simplified/obfuscated version, but maybe you can point me in the right direction. Here's what's happening:

```
CSV::VERSION # => "3.1.9"
File.open("file.csv.gz", encoding: 'binary') do |io|
  Zlib::GzipReader.wrap(io) do |rio|
    CSV.new(rio).count
  end
end
```

Results in:

```
~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:346:in `rescue in parse': Invalid byte sequence in UTF-8 in line 38424. (CSV::MalformedCSVError)
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:329:in `parse'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv.rb:2345:in `each'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv.rb:2345:in `each'
  ...
~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:237:in `read_chunk': CSV::Parser::InvalidEncoding (CSV::Parser::InvalidEncoding)
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:157:in `scan_all'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:1009:in `parse_quoted_column_value'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:962:in `parse_column_value'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:886:in `parse_quotable_robust'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:864:in `block in parse_quotable_loose'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:127:in `block in each_line'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:103:in `each_line'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:103:in `each_line'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:825:in `parse_quotable_loose'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv/parser.rb:336:in `parse'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv.rb:2345:in `each'
	from ~/.rbenv/versions/3.0.1/lib/ruby/3.0.0/csv.rb:2345:in `each'
	from (irb):3:in `count'
```

While the following succeeds:

```
File.open("file.csv", 'w', encoding: 'binary') do |wio|
  File.open("file.csv.gz", encoding: 'binary') do |io|
    Zlib::GzipReader.wrap(io) do |rio|
      IO.copy_stream rio, wio
    end
  end
end

File.open("file.csv") do |rio|
  CSV.new(rio).count
end
```

I have narrowed it down to https://github.com/ruby/csv/blob/v3.1.9/lib/csv/parser.rb#L235-L237, it looks like reading the chunk truncates the string at an UTF8 character and `chunk.valid_encoding?` therefore results in false.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next