From: "zmoazeni (Zach Moazeni)" Date: 2013-03-20T08:24:35+09:00 Subject: [ruby-core:53559] [ruby-trunk - Bug #8129][Open] String#index has drastically different performance when a single unicode character is included Issue #8129 has been reported by zmoazeni (Zach Moazeni). ---------------------------------------- Bug #8129: String#index has drastically different performance when a single unicode character is included https://bugs.ruby-lang.org/issues/8129 Author: zmoazeni (Zach Moazeni) Status: Open Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 I created a simple ruby script: ``` #! /usr/bin/env ruby raise "need a file name" unless ARGV[0] contents = File.read(ARGV[0]) 326_000.times do |i| contents[(i + 23) % contents.size] end ``` And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash"). String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes! Any idea why the performance is so dramatically different between the two? -- http://bugs.ruby-lang.org/