verify that postgres is up-and-running on startup#29
verify that postgres is up-and-running on startup#29njaard wants to merge 1 commit intoboustrophedon:masterfrom
Conversation
Pull Request Test Coverage Report for Build 22128353563Details
💛 - Coveralls |
src/run_db.rs
Outdated
| use std::os::unix::net::UnixStream; | ||
|
|
||
| let Ok(mut stream) = | ||
| UnixStream::connect(format!("{unix_socket_directory}/.s.PGSQL.{tcp_port}")) |
There was a problem hiding this comment.
Do you know if this is publicly documented i.e. won't change based on the pg version?
There was a problem hiding this comment.
It's documented right for unix_socket_directories. I expect it to be stable because you can connect to the server with different versions of psql
src/lib.rs
Outdated
| /// and then set it as the current port. | ||
| /// | ||
| /// If you don't set the port in advance, then a port will be automatically selected | ||
| /// atomically, and can prevent a race condition, therefor you should avoid calling this function. |
There was a problem hiding this comment.
Can you clarify this comment? Why should you avoid calling it?
There was a problem hiding this comment.
This function gets a random port "right now" and puts it into the Builder. I don't want to change the API for the builder, so I can't have it hold the TcpListener. Therefor, use of this function causes pgtemp to retain the previous behavior.
Can you proposed improved wording for that comment?
src/run_db.rs
Outdated
| let _ = stream.set_write_timeout(Some(Duration::from_millis(500))); | ||
|
|
||
| // simply send the SSL handshake | ||
| let ssl_request = [0u8, 0, 0, 8, 4, 210, 22, 47]; |
There was a problem hiding this comment.
Can you add some documentation to this? eg. when I search for "ssl handshake bytes", because I do not have that data structure memorized, I see that it seems like the first byte of the client hello should be 0x16.
|
This is a neat solution, thanks for digging into this. I left a few comments and there also appears to be a lint issue. Thanks again! |
The lint issue pertains to a piece of python code apparently unrelated to my change |
|
Sorry for the delay, had some personal issues as well as work. Can you rebase on master? It should fix any lint issues. |
I did so, but now the CI tests are failing in some other way, which may be a spurious or github change. The tests pass on my machine. |
|
Yeah, this seems flaky still. I've rerun it a few times and master doesn't error but random tests with this change are: |
|
sorry, accidentally closed somehow |
f0526d1 to
ae7860e
Compare
|
Well, the reason that my solution seemed to work is that it would bind to the ipv4 port, and postgres would bind to the ipv6 port. Then when you connect to "localhost", you'd get the ipv6 port. So, back to the drawing board for random port selection, I guess. Maybe an alternate approach is to keep on trying to start postgres on a randomly selected port until it succeeds. |
|
It might be useful to simply email the postgres mailing list or perhaps see how their own CI is set up. I might take a look today or tomorrow. |
|
I had a crazy scheme of just proxying the Unix domain socket |
|
Ah, bind to either a random tcp addr/port or the user provided addr/port and then have pgtemp just proxy the traffic to the pg server's local unix socket? That would probably work (and we're already proxying in the daemon anyway, so it's not that crazy). I wonder if we could simplify/unify the "base" pgtemp code with the daemon code somehow by doing this. |
|
It looks like there's actually a small standalone binary called "pg_isready" that basically implements this functionality. We could probably just shell out to that. |
|
Actually, sorry, I'm not sure what the underlying issue is currently. Is it that we're connecting too quickly before the pg server is ready, or is it that when running a large suite of tests sometimes we end up trying to start a server on a port that already has one running? |
|
Take a look again! I think I got it this time. |
Atomically acquire the port by building the TcpListener before postgres even starts. Tell postgres to not even bind to a TCP port; it only creates a Unix Domain Socket. It still names that filename with "5432", but it doesn't bind to a port. Then, proxy the Unix Domain Socket over the TcpListener we previously acquired. We can now be sure which postgres we are talking to and on what port. Finally, as our last trick, we can verify postgres is up and running by running the `pg_isready` CLI tool. We tell pg_isready to check the unix domain socket path (though we could just as well have it go over our proxied TCP listener)
|
@boustrophedon bump :) |
Atomically acquire the port by retaining the TcpListener until after postgres starts up. We verify postgres is running by connecting to it via its unix domain socket, and then send the raw minimum bytes to verify.
I have confirmed this works because I can run extensive automated tests on my own software with
cargo t -j<very large number>and it no longer fails sporadically by either postgres saying that it's not yet ready, or the weirder message where it gets the wrong port number by chance.