Skip to content

verify that postgres is up-and-running on startup#29

Open
njaard wants to merge 1 commit intoboustrophedon:masterfrom
njaard:master
Open

verify that postgres is up-and-running on startup#29
njaard wants to merge 1 commit intoboustrophedon:masterfrom
njaard:master

Conversation

@njaard
Copy link

@njaard njaard commented Jan 22, 2026

Atomically acquire the port by retaining the TcpListener until after postgres starts up. We verify postgres is running by connecting to it via its unix domain socket, and then send the raw minimum bytes to verify.

I have confirmed this works because I can run extensive automated tests on my own software with cargo t -j<very large number> and it no longer fails sporadically by either postgres saying that it's not yet ready, or the weirder message where it gets the wrong port number by chance.

@coveralls
Copy link

coveralls commented Jan 22, 2026

Pull Request Test Coverage Report for Build 22128353563

Details

  • 70 of 73 (95.89%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.08%) to 89.104%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/run_db.rs 54 57 94.74%
Files with Coverage Reduction New Missed Lines %
src/run_db.rs 3 85.33%
Totals Coverage Status
Change from base Build 21737054921: 0.08%
Covered Lines: 507
Relevant Lines: 569

💛 - Coveralls

src/run_db.rs Outdated
use std::os::unix::net::UnixStream;

let Ok(mut stream) =
UnixStream::connect(format!("{unix_socket_directory}/.s.PGSQL.{tcp_port}"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is publicly documented i.e. won't change based on the pg version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's documented right for unix_socket_directories. I expect it to be stable because you can connect to the server with different versions of psql

src/lib.rs Outdated
/// and then set it as the current port.
///
/// If you don't set the port in advance, then a port will be automatically selected
/// atomically, and can prevent a race condition, therefor you should avoid calling this function.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this comment? Why should you avoid calling it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function gets a random port "right now" and puts it into the Builder. I don't want to change the API for the builder, so I can't have it hold the TcpListener. Therefor, use of this function causes pgtemp to retain the previous behavior.

Can you proposed improved wording for that comment?

src/run_db.rs Outdated
let _ = stream.set_write_timeout(Some(Duration::from_millis(500)));

// simply send the SSL handshake
let ssl_request = [0u8, 0, 0, 8, 4, 210, 22, 47];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation to this? eg. when I search for "ssl handshake bytes", because I do not have that data structure memorized, I see that it seems like the first byte of the client hello should be 0x16.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@boustrophedon
Copy link
Owner

This is a neat solution, thanks for digging into this. I left a few comments and there also appears to be a lint issue. Thanks again!

@njaard
Copy link
Author

njaard commented Jan 22, 2026

This is a neat solution, thanks for digging into this. I left a few comments and there also appears to be a lint issue. Thanks again!

The lint issue pertains to a piece of python code apparently unrelated to my change

@boustrophedon
Copy link
Owner

Sorry for the delay, had some personal issues as well as work. Can you rebase on master? It should fix any lint issues.

@njaard
Copy link
Author

njaard commented Feb 6, 2026

Sorry for the delay, had some personal issues as well as work. Can you rebase on master? It should fix any lint issues.

I did so, but now the CI tests are failing in some other way, which may be a spurious or github change. The tests pass on my machine.

@boustrophedon
Copy link
Owner

boustrophedon commented Feb 7, 2026

Yeah, this seems flaky still. I've rerun it a few times and master doesn't error but random tests with this change are:


test create_table_and_insert ... ok
test check_database_name ... FAILED
test builder_setters ... ok

failures:

---- check_database_name stdout ----

thread 'check_database_name' (6514) panicked at tests/basic_operations.rs:16:10:
failed to connect to db: Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })

@boustrophedon
Copy link
Owner

sorry, accidentally closed somehow

@boustrophedon boustrophedon reopened this Feb 7, 2026
@njaard njaard force-pushed the master branch 3 times, most recently from f0526d1 to ae7860e Compare February 7, 2026 22:05
@njaard
Copy link
Author

njaard commented Feb 7, 2026

Well, the reason that my solution seemed to work is that it would bind to the ipv4 port, and postgres would bind to the ipv6 port. Then when you connect to "localhost", you'd get the ipv6 port. SO_REUSEPORT was never actually used, and postgres would always fail to bind to the ipv4 port.

So, back to the drawing board for random port selection, I guess.

Maybe an alternate approach is to keep on trying to start postgres on a randomly selected port until it succeeds.

@njaard njaard marked this pull request as draft February 7, 2026 22:41
@boustrophedon
Copy link
Owner

It might be useful to simply email the postgres mailing list or perhaps see how their own CI is set up. I might take a look today or tomorrow.

@njaard
Copy link
Author

njaard commented Feb 8, 2026

I had a crazy scheme of just proxying the Unix domain socket

@boustrophedon
Copy link
Owner

boustrophedon commented Feb 8, 2026

Ah, bind to either a random tcp addr/port or the user provided addr/port and then have pgtemp just proxy the traffic to the pg server's local unix socket? That would probably work (and we're already proxying in the daemon anyway, so it's not that crazy).

I wonder if we could simplify/unify the "base" pgtemp code with the daemon code somehow by doing this.

@boustrophedon
Copy link
Owner

It looks like there's actually a small standalone binary called "pg_isready" that basically implements this functionality. We could probably just shell out to that.

@boustrophedon
Copy link
Owner

Actually, sorry, I'm not sure what the underlying issue is currently. Is it that we're connecting too quickly before the pg server is ready, or is it that when running a large suite of tests sometimes we end up trying to start a server on a port that already has one running?

@njaard njaard marked this pull request as ready for review February 18, 2026 05:50
@njaard
Copy link
Author

njaard commented Feb 18, 2026

Take a look again! I think I got it this time.

Atomically acquire the port by building the TcpListener before postgres
even starts. Tell postgres to not even bind to a TCP port; it only
creates a Unix Domain Socket. It still names that filename with "5432",
but it doesn't bind to a port.

Then, proxy the Unix Domain Socket over the TcpListener we previously
acquired. We can now be sure which postgres we are talking to and on
what port.

Finally, as our last trick, we can verify postgres is up and running
by running the `pg_isready` CLI tool. We tell pg_isready to check
the unix domain socket path (though we could just as well have it
go over our proxied TCP listener)
@njaard
Copy link
Author

njaard commented Mar 1, 2026

@boustrophedon bump :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants