From: naruse@... Date: 2020-11-25T09:05:45+00:00 Subject: [ruby-core:101068] [Ruby master Bug#17220] Rails Active Job integration test fails with Ruby 3.0.0 since 2038cc6cab6ceeffef3ec3a765c70ae684f829ed Issue #17220 has been updated by naruse (Yui NARUSE). As glass_saga says, * Before fork, all pthreads need to be stopped (this is considered Unix's practical restriction as far as I understand) * getaddrinfo_a uses their own pthread to provide async feature. It has a queue and pthreads (up to 20 threads) to handle DNS requests. getaddrinfo_a with NOWAIT posts a task to the queue. * With gai_cancel(3), it can remove queuing task. But it doesn't remove/stop already running task. * With gai_error(3), it can get the status of the task. * To stop those pthreads, it needs to stop posting new task, remove all queuing tasks, and then stop/wait running tasks. After that worker threads will finish after 1 second sleep. To skip that sleep, we need to call an internal glibc API __gai_new_request_notification(). * To stop posting new task, just getting GVL. * To remove all queuing tasks, we can use gai_cancel. Though the manpage says gai_cancel(NULL), it actually pass all arguments one by one. * There's no way to stop running tasks with glibc's getaddrinfo_a. To stop them, we need to re-implement getaddrinfo_a. With our own impl, we can use pthread_cancel(3) to stop getaddrinfo(3) in the threads. It seems that some issues are seen but actually we can do that. (ref. https://bugzilla.redhat.com/show_bug.cgi?id=1209433) * In Ruby 3.0 we'll ensure to stop pthreads before fork. In the future, we'll provide further enchancements. ---------------------------------------- Bug #17220: Rails Active Job integration test fails with Ruby 3.0.0 since 2038cc6cab6ceeffef3ec3a765c70ae684f829ed https://bugs.ruby-lang.org/issues/17220#change-88742 * Author: yahonda (Yasuo Honda) * Status: Assigned * Priority: Normal * Assignee: Glass_saga (Masaki Matsushita) * Target version: 3.0 * ruby -v: ruby 2.8.0dev (2020-08-27T07:39:13Z v3_0_0_preview1~397 2038cc6cab) [x86_64-linux] * Backport: 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONTNEED ---------------------------------------- One of the Rails CI, Active Job integration test with sidekiq, against Ruby 3.0.0 has been failing since August 30, 2020. According to `git bisect` is is triggered by 2038cc6cab6ceeffef3ec3a765c70ae684f829ed . Somehow this issue only reproduces with Ruby on Docker like `rubylang/ruby:master-nightly-bionic` It does not reproduce if Ruby is installed locally using `rbenv install 3.0.0-dev` on Ubuntu 20.04 and macOS 11 beta. ### The first failed build job https://buildkite.com/rails/rails/builds/71321#84b29655-b3df-4b5c-8b20-cbf15ecd9653 ``` ruby Ruby 2.8.0p-1 (2020-08-29 revision d7492a0be885ea9f2b9f71e3e95582f9a859c439) [x86_64-linux] ``` ### The last successful build job https://buildkite.com/rails/rails/builds/71143#369217f7-95f6-4ab9-8ef5-7c6364bd803e ```ruby Ruby 2.8.0p-1 (2020-08-20 revision a74df67244199d1fd1f7a20b49dd5a096d2a13a2) [x86_64-linux] ``` ### `git bisect` result Performed `git bisect` ruby/ruby repository and it says `2038cc6cab6ceeffef3ec3a765c70ae684f829ed` triggers this build failure. ## Steps to reproduce 1. Install Docker 2. Install Ruby 2.7.1 (or whatever Ruby version to run rake) 3. Create Ruby docker image for 2038cc6cab6ceeffef3ec3a765c70ae684f829ed ``` git clone https://github.com/ruby/ruby-docker-images.git cd ruby-docker-images rake docker:build ruby_version=master:2038cc6cab6ceeffef3ec3a765c70ae684f829ed ``` 4. Run Rails CI using the Docker image created in step 3 ``` cd ~ git clone https://github.com/rails/rails.git cd rails git clone https://github.com/rails/buildkite-config .buildkite/ RUBY_IMAGE=rubylang/ruby:master-2038cc6cab6ceeffef3ec3a765c70ae684f829ed-bionic docker-compose -f .buildkite/docker-compose.yml build base && CI=1 docker-compose -f .buildkite/docker-compose.yml run default runner activejob 'AJ_ADAPTER=sidekiq AJ_INTEGRATION_TESTS=true bin/test test/integration/queuing_test.rb --seed 5170' ``` ## Actual result ``` Using sidekiq Run options: --seed 5170 # Running: .SSSF Failure: QueuingTest#test_should_run_job_enqueued_in_the_future_at_the_specified_time [/rails/activejob/test/integration/queuing_test.rb:76]: Expected false to be truthy. bin/test test/integration/queuing_test.rb:71 .F Failure: QueuingTest#test_should_run_jobs_enqueued_on_a_listening_queue [/rails/activejob/test/integration/queuing_test.rb:14]: Expected false to be truthy. bin/test test/integration/queuing_test.rb:11 .SS..F Failure: QueuingTest#test_current_locale_is_kept_while_running_perform_later [/rails/activejob/test/integration/queuing_test.rb:102]: Expected false to be truthy. bin/test test/integration/queuing_test.rb:93 F Failure: QueuingTest#test_current_timezone_is_kept_while_running_perform_later [/rails/activejob/test/integration/queuing_test.rb:119]: Expected false to be truthy. bin/test test/integration/queuing_test.rb:110 .. Finished in 34.153644s, 0.4392 runs/s, 0.3514 assertions/s. 15 runs, 12 assertions, 4 failures, 0 errors, 5 skips You have skipped tests. Run with --verbose for details. ``` ## Expected result It should success as the Ruby as of the previous commit `1035a3b202ee86bf2b0a1d00eefcfff0d7ab9f6b` does. ``` $ RUBY_IMAGE=rubylang/ruby:master-1035a3b202ee86bf2b0a1d00eefcfff0d7ab9f6b-bionic docker-compose -f .buildkite/docker-compose.yml build base && CI=1 docker-compose -f .buildkite/docker-compose.yml run default runner activejob 'AJ_ADAPTER=sidekiq AJ_INTEGRATION_TESTS=true bin/test test/integration/queuing_test.rb --seed 5170' ``` ``` +++ +++ activejob: AJ_ADAPTER=sidekiq AJ_INTEGRATION_TESTS=true bin/test test/integration/queuing_test.rb --seed 5170 Using sidekiq Run options: --seed 5170 # Running: .SSS....SS..... Finished in 13.647623s, 1.0991 runs/s, 1.0258 assertions/s. 15 runs, 14 assertions, 0 failures, 0 errors, 5 skips You have skipped tests. Run with --verbose for details. ``` -- https://bugs.ruby-lang.org/ Unsubscribe: