Skip to content

Add QUIC support#1008

Open
tenderlove wants to merge 20 commits intoruby:masterfrom
tenderlove:quic-api
Open

Add QUIC support#1008
tenderlove wants to merge 20 commits intoruby:masterfrom
tenderlove:quic-api

Conversation

@tenderlove
Copy link
Member

@tenderlove tenderlove commented Feb 23, 2026

This commit exposes various QUIC methods so that you can build QUIC clients and servers.

I'm really sorry about the very large patch. I can split it in to smaller patches if that helps to get this upstream. I think the patch is pretty straightforward though. It just adds different QUIC related methods that OpenSSL provides. I've been using this patch to develop a QUIC/HTTP3 client and server in Ruby.

My development branch is here. Here is an example script that uses these APIs.

@tenderlove
Copy link
Member Author

tenderlove commented Feb 25, 2026

I've been able to serve a Rails application with QUIC+HTTP/3 using this patch:

Screenshot 2026-02-25 at 8 55 55 AM

@rhenium
Copy link
Member

rhenium commented Feb 26, 2026

Thanks for working on this!

I wonder if QUIC listener SSL, QUIC connection SSL, and QUIC stream SSL should be represented as separate (sub)classes. Although they all have the same type SSL, they support different sets of methods, so that would make

I think the blocking methods such as #read and #wait_readable need to be updated to check SSL_poll() rather than just the readiness of the underlying socket. I imagine currently SSLSocket#read can enter a busy loop if the stream is blocked by another stream on the same connection, even when the underlying socket is readable.

Regarding the thread assisted mode (OSSL_QUIC_client_thread_method() and SSL_set_blocking_mode()), please note that we currently don't release the GVL when calling SSL_read(). It may take some work to do so, as it needs fixing callbacks (e.g., SSLContext#new_session_cb which can be called at any time in the connection lifecycle) to re-acquire it as necessary. I'm not so sure if supporting this mode is worthwhile at the moment, especially since the server side support is still missing in OpenSSL and we have to consider both. It appears you didn't use this mode in the kantan repo either.

@ioquatix, what do you think?


GetSSL(self, ssl);
return SSL_is_init_finished(ssl) ? Qtrue : Qfalse;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to QUIC support? Is this different from seeing whether #accept/#connect/#accept_connection successfully returned or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm using it on the server side to detect if the handshake has finished here. Without this, open_streams can raise an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. I'm still trying to understand this, but I think this means SSL_accept_connection() returns once it receives the Initial packet and doesn't wait for the Handshake packet. The man page says:

https://docs.openssl.org/3.6/man3/SSL_new_listener/

The new SSL object returned from SSL_accept_connection() may or may not have completed its handshake at the point it is returned. Optionally, you may use the function SSL_is_init_finished(3) to determine this. You may call the functions SSL_accept(3), SSL_do_handshake(3) or SSL_handle_events(3) to progress the state of the SSL object towards handshake completion. Other "I/O" functions may also implicitly progress the state of the handshake such as SSL_poll(3), SSL_read(3) and SSL_write(3).

#accept_nonblock appears to work for this use-case, and I think this is clearer.

diff --git a/lib/kantan/h3/poll_session.rb b/lib/kantan/h3/poll_session.rb
index 609c8d1f46ad..c6fe30d1695e 100644
--- a/lib/kantan/h3/poll_session.rb
+++ b/lib/kantan/h3/poll_session.rb
@@ -13,13 +13,12 @@ module Kantan
         wfds = []

         # Wait for handshake
-        until @conn.init_finished?
+        until @conn.accept_nonblock(exception: false) == @conn
           rfds.clear
           wfds.clear
           rfds << @io if @conn.net_read_desired?
           wfds << @io if @conn.net_write_desired?
           IO.select(rfds, wfds, nil, @conn.event_timeout)
-          @conn.handle_events
         end

         open_streams

I wish I could simply use the blocking @conn.accept here, but currently it doesn't take SSL_get_event_timeout() into account when waiting for the underlying socket with rb_io_wait(). I think that can be a later improvement.

Copy link
Member Author

@tenderlove tenderlove Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great. I was able to apply your patch and it works well. It seems like I still need init_finished? on the client side though. I'm not sure how to get rid of it.

This commit exposes various QUIC methods so that you can build QUIC
clients and servers
@tenderlove
Copy link
Member Author

I wonder if QUIC listener SSL, QUIC connection SSL, and QUIC stream SSL should be represented as separate (sub)classes. Although they all have the same type SSL, they support different sets of methods, so that would make

I'm not sure. We could do that, and I'm happy to do that work. It's up to you.

Regarding the thread assisted mode (OSSL_QUIC_client_thread_method() and SSL_set_blocking_mode()), please note that we currently don't release the GVL when calling SSL_read(). It may take some work to do so, as it needs fixing callbacks (e.g., SSLContext#new_session_cb which can be called at any time in the connection lifecycle) to re-acquire it as necessary. I'm not so sure if supporting this mode is worthwhile at the moment, especially since the server side support is still missing in OpenSSL and we have to consider both. It appears you didn't use this mode in the kantan repo either.

I think we shouldn't support it, and I'll remove it. I've updated the patch to make QUIC connections always non-blocking, and I think that's the way we should go. I've removed the accessor to SSL_set_blocking_mode, and I'll remove the OSSL_QUIC_client_thread_method. I tried using OSSL_QUIC_client_thread_method, but ran in to exactly the GVL problem, so changed to non-blocking mode in my code.

@tenderlove
Copy link
Member Author

I wonder if QUIC listener SSL, QUIC connection SSL, and QUIC stream SSL should be represented as separate (sub)classes. Although they all have the same type SSL, they support different sets of methods, so that would make

I'm not sure. We could do that, and I'm happy to do that work. It's up to you

Ah, I remember. I had considered adding a subclass as you say, but I've always run in to issues when trying to work with a subclass on T_DATA objects. I think we should not make it a subclass, but my only reason is from having a bad experience in the past.

@tenderlove
Copy link
Member Author

@rhenium do you mind taking another look?

I've:

  • Simplified extconf so that we only check openssl version
  • Removed client thread mode
  • Implemented accept_connection_nonblock and accept_stream_nonblock
  • Removed SSL_poll and related functions

I think the patch is much smaller now, so hopefully it is easier to review. I don't mind adding a subclass for QUIC if you want it (but as I said I've had troubles with them in the past).

I'm not sure if we want blocking methods like accept_connection and accept_stream. I'm not using them, and to be honest I've run in to lots of deadlocks when accidentally using blocking methods. Also I added CI to the Kantan repo so that I can prove this API works besides just on my machine 😅

We need this method to determine if a stream is closed or not.  Calling
read_nonblock on a closed stream will raise an exception (even with
`exception: false`) so we need this API to know not to call
`read_nonblock`
@rhenium
Copy link
Member

rhenium commented Mar 3, 2026

Ah, I remember. I had considered adding a subclass as you say, but I've always run in to issues when trying to work with a subclass on T_DATA objects. I think we should not make it a subclass, but my only reason is from having a bad experience in the past.

I'm interested to hear about the problems you ran into. In ruby/openssl, OpenSSL::PKey::PKey and its subclasses such as RSA share the same RTypedData type, but I don't recall this has caused problems in the past several years.

I think separating them might be clearer for documentation purpose, e.g., methods like #listen or #handle_events have generic names, but they are only valid for some QUIC SSLSocket objects.

That said, I'm fine with having a single SSLSocket class too. After all, it's what the OpenSSL C API exposes.

I'm not sure if we want blocking methods like accept_connection and accept_stream. I'm not using them, and to be honest I've run in to lots of deadlocks when accidentally using blocking methods.

I think they would be handy, but not essential. We can add them later.

Are the deadlock possibly because the GVL was not released during the blocking calls? We should be able to implement them safely using non-blocking OpenSSL calls and Ruby's IO.select or rb_io_wait(), which is how the existing blocking methods are implemented.

Comment on lines +2884 to +2888
if (!stream_ssl) {
if (no_exception_p(opts))
return sym_wait_readable;
ossl_raise(eSSLErrorWaitReadable, "accept_stream would block");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should query the SSL_get_error() value since the connection may have failed or been closed and it's not retryable. In that case OpenSSL::SSL::SSLError should be raised instead.

SSL_accept_connection() and SSL_new_stream() also need it according to the man page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking these. I'm not sure about SSL_accept_connection (though I saw [this section in the man pages)(https://github.com/openssl/openssl/blob/b7a42c6a3c5fe58a613e6d34ee5c23a5c072c5e2/doc/man3/SSL_new_listener.pod?plain=1#L175-L176).

If you read the server demo code, the non-blocking server never checks SSL_get_error. The blocking server doesn't either. I tried calling it and got a "wrong type" error from this line.

I added error checking for SSL_new_stream and SSL_accept_stream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to partially revert the error checking. If I add error checking to SSL_accept_stream, it causes my test server to start stalling.

In the Kantan repo you can do this to start the server:

$ ruby -I lib::~/git/openssl/lib benchmarks/h3_server.rb

Then this to start the client:

$ ruby -I lib:~/git/openssl/lib benchmarks/h3_client.rb

Adding the error checking will cause it to stall.

GetSSLCTX(v_ctx, ctx);
ossl_sslctx_setup(v_ctx);

listener = SSL_new_listener(ctx, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listener is leaked in the error cases, for example when v_io is invalid or the wrapper object allocation fails.

#accept_stream and #new_stream can also leak the SSL if the wrapper object allocation fails.

Comment on lines +2830 to +2835
stream_ssl = SSL_new_stream(ssl, flags);
if (!stream_ssl) {
if (flags & SSL_STREAM_FLAG_NO_BLOCK)
return Qnil;
ossl_raise(eSSLError, "SSL_new_stream");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-reading the man page, I started to think SSL_new_stream() also should have a non-blocking variant. I thought SSL_new_stream() was a local-only operation, but apparently it needs to write to the underlying socket unless SSL_STREAM_FLAG_ADVANCE is specified.

Comment on lines +2840 to +2860
/*
* call-seq:
* ssl.accept_stream => SSLSocket
*
* Accepts an incoming QUIC stream from the peer. Blocks until a stream is
* available. Returns a new SSLSocket representing the stream.
*
* Raises OpenSSL::SSL::SSLError on failure.
*/
static VALUE
ossl_ssl_accept_stream(VALUE self)
{
SSL *ssl, *stream_ssl;

GetSSL(self, ssl);
stream_ssl = SSL_accept_stream(ssl, 0);
if (!stream_ssl)
ossl_raise(eSSLError, "SSL_accept_stream");

return ossl_ssl_wrap_stream(self, stream_ssl);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently doesn't actually block, presumably because we disable SSL_set_blocking_mode() and the underlying socket is also set to O_NONBLOCK. I think we'll need to do something similar to ossl_start_ssl() here.

@rhenium
Copy link
Member

rhenium commented Mar 3, 2026

I've playing with this, and I found SSL_write() can fail with the SSL_get_error() error code SSL_ERROR_WANT_WRITE when it actually needs to read an ACK from the peer to progress.

The blocking SSLSocket#syswrite seems to need to check for SSL_net_{read,write}_desired(). I suspect #sysread might need it, too, but I haven't been able to actually trigger.

I think both methods should prevent rb_io_wait() from sleeping more than conn.event_timeout seconds, in case no other thread is calling conn.handle_events either explicitly or implicitly.

When running this code, one CPU starts pegging at 100% in 513th stream.write:

require "openssl"
require "socket"

pkey = OpenSSL::PKey::EC.generate("prime256v1")
cert = OpenSSL::X509::Certificate.new.tap do |cert|
  cert.version = 2
  cert.serial = 1
  cert.not_before = Time.now - 60
  cert.not_after = Time.now + 86400
  cert.public_key = pkey
  cert.subject = cert.issuer = OpenSSL::X509::Name.new([["CN", "localhost"]])
  cert.sign(pkey, "sha256")
end

server_sock = UDPSocket.new.tap { it.bind("localhost", 5433) }
client_sock = UDPSocket.new.tap { it.connect("localhost", 5433) }

server_thread = Thread.start do
  ctx = OpenSSL::SSL::SSLContext.quic(:server)
  ctx.add_certificate(cert, pkey)
  ctx.alpn_select_cb = -> ary { ary.last }
  listener = OpenSSL::SSL::SSLSocket.new_listener(server_sock, context: ctx)

  begin
    connection = listener.accept_connection_nonblock
  rescue IO::WaitReadable
    p [:s, :accept_connection_wait_readable]
    server_sock.wait_readable
    retry
  end
  p [:s, :accept_connection]

  begin
    connection.accept_nonblock
  rescue IO::WaitReadable
    p [:s, :accept_wait_readable]
    server_sock.wait_readable(connection.event_timeout)
    retry
  end
  p [:s, :accept]

  stream = begin
    connection.accept_stream_nonblock
  rescue IO::WaitReadable
    p [:s, :accept_stream_wait_readable]
    sleep 0.1
    server_sock.wait_readable(connection.event_timeout)
    retry
  end
  p [:s, :accept_stream, stream.stream_id]

  loop do
    connection.handle_events
    sleep 0.1
  end
end

client_thread = Thread.start do
  ctx = OpenSSL::SSL::SSLContext.quic(:client)
  ctx.alpn_protocols = ["a"]
  ctx.groups = "X25519" # XXX
  connection = OpenSSL::SSL::SSLSocket.new(client_sock, ctx)

  begin
    connection.connect_nonblock
  rescue IO::WaitReadable
    p [:c, :connect_wait_readable]
    client_sock.wait_readable
    retry
  end

  stream = connection.new_stream

  # OpenSSL's initial_max_stream_data_bidi_remote appears to be 512KB
  # write will fail after 512 times
  1_000.times do |i|
    stream.write("x"*1024)
    p [:c, :write, stream.stream_id, i]
  end
end

[server_thread, client_thread].each(&:join)

@tenderlove
Copy link
Member Author

@rhenium I will try your example and see if I can fix it.

The blocking SSLSocket#syswrite seems to need to check for SSL_net_{read,write}_desired(). I suspect #sysread might need it, too, but I haven't been able to actually trigger.

I think both methods should prevent rb_io_wait() from sleeping more than conn.event_timeout seconds, in case no other thread is calling conn.handle_events either explicitly or implicitly.

That makes sense. I'll try to implement it. This really makes me think we should have a specific class for QUIC connections.

I'm interested to hear about the problems you ran into. In ruby/openssl, OpenSSL::PKey::PKey and its subclasses such as RSA share the same RTypedData type, but I don't recall this has caused problems in the past several years.

Honestly this was a long time ago and I don't remember the details. I think it was something to do with figuring out logic inside the GC free function but not knowing the type. I really have no strong feeling against adding a special QUIC class (in fact I'm starting to think we really should add one).

Are the deadlock possibly because the GVL was not released during the blocking calls? We should be able to implement them safely using non-blocking OpenSSL calls and Ruby's IO.select or rb_io_wait(), which is how the existing blocking methods are implemented.

Yes. IO was holding the GVL in one thread, and GC tried to enter in another (actually I was testing with Ractors, but I think the same would happen with Thread).

We want to have blocking operations, but we can't hold the GVL, so we'll
emulate blocking operations by calling the non-blocking API, then call
io_waitreadable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants