From: "brodock (Gabriel Mazetto) via ruby-core" Date: 2023-03-15T15:31:40+00:00 Subject: [ruby-core:112902] [Ruby master Bug#19412] Socket starts queueing and not responding after a certain amount of requests Issue #19412 has been updated by brodock (Gabriel Mazetto). I've went back to my Linux machine and tried to reproduce the problem. Using Ruby 3.0 (from the package manager), I got to make it crash only using some decent concurrency (100) and running it for longer time: ``` brodock@pop-os:~/Projetos/ractor$ neofetch ///////////// brodock@pop-os ///////////////////// -------------- ///////*767//////////////// OS: Pop!_OS 22.04 LTS x86_64 //////7676767676*////////////// Kernel: 6.2.0-76060200-generic /////76767//7676767////////////// Uptime: 13 mins /////767676///*76767/////////////// Packages: 2031 (dpkg), 25 (flatpak) ///////767676///76767.///7676*/////// Shell: bash 5.1.16 /////////767676//76767///767676//////// Resolution: 3840x2160 //////////76767676767////76767///////// DE: GNOME 42.5 ///////////76767676//////7676////////// WM: Mutter ////////////,7676,///////767/////////// WM Theme: Pop /////////////*7676///////76//////////// Theme: Pop-dark [GTK2/3] ///////////////7676//////////////////// Icons: Pop [GTK2/3] ///////////////7676///767//////////// Terminal: gnome-terminal //////////////////////'//////////// CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz //////.7676767676767676767,////// GPU: NVIDIA GeForce GTX 1660 SUPER /////767676767676767676767///// Memory: 3709MiB / 32002MiB /////////////////////////// ///////////////////// ///////////// ``` ``` ab -n 90000 -c 200 http://localhost:8080/ This is ApacheBench, Version 2.3 <$Revision: 1879490 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 9000 requests Completed 18000 requests Completed 27000 requests Completed 36000 requests Completed 45000 requests Completed 54000 requests Completed 63000 requests Completed 72000 requests Completed 81000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 89888 requests completed ``` It seems Linux can handle the issue better than Mac OS, but whatever is breaking is still there. This is what I got running the server part: ``` GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 bug-19412.rb:13:in `write': Broken pipe (Errno::EPIPE) from bug-19412.rb:13:in `handle_connection' from bug-19412.rb:28:in `block in
' from bug-19412.rb:27:in `loop' from bug-19412.rb:27:in `
' ``` When I just run the `client.rb` it doesn��t break as there is not enough concurrency, so it never reaches a point where it can exhaust whatever resource is getting exhausted. Weirdly enough I can��t reproduce the problem when using hey: https://github.com/rakyll/hey ``` hey -n 180000 -c 300 http://localhost:8080/ Summary: Total: 14.8814 secs Slowest: 14.5462 secs Fastest: 0.0002 secs Average: 0.0112 secs Requests/sec: 12095.6387 Response time histogram: 0.000 [1] | 1.455 [179742] |������������������������������������������������������������������������������������������������������������������������ 2.909 [36] | 4.364 [77] | 5.819 [16] | 7.273 [88] | 8.728 [23] | 10.182 [0] | 11.637 [0] | 13.092 [0] | 14.546 [17] | Latency distribution: 10% in 0.0006 secs 25% in 0.0015 secs 50% in 0.0019 secs 75% in 0.0020 secs 90% in 0.0022 secs 95% in 0.0023 secs 99% in 0.0025 secs Details (average, fastest, slowest): DNS+dialup: 0.0066 secs, 0.0002 secs, 14.5462 secs DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0491 secs req write: 0.0000 secs, 0.0000 secs, 0.0025 secs resp wait: 0.0045 secs, 0.0000 secs, 13.5324 secs resp read: 0.0000 secs, 0.0000 secs, 0.0033 secs Status code distribution: [200] 180000 responses ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102420 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in
' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `
' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `
' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/