Skip to content

CASSANALYTICS-145: Add exception stack walking to port mapping retry …#191

Open
jmckenzie-dev wants to merge 1 commit intoapache:trunkfrom
jmckenzie-dev:fix_retries_more
Open

CASSANALYTICS-145: Add exception stack walking to port mapping retry …#191
jmckenzie-dev wants to merge 1 commit intoapache:trunkfrom
jmckenzie-dev:fix_retries_more

Conversation

@jmckenzie-dev
Copy link
Copy Markdown
Contributor

…logic on analytics integration tests

Patch by Josh McKenzie; reviewed by TBD for CASSANALYTICS-145

…logic on analytics integration tests

Patch by Josh McKenzie; reviewed by TBD for CASSANALYTICS-145
@jmckenzie-dev
Copy link
Copy Markdown
Contributor Author

Going to keep an eye on https://app.circleci.com/pipelines/gh/jmckenzie-dev/cassandra-analytics/21/details?useNewPipelines=true and see how it looks. Might kick it a couple times for extra runs just to make sure we don't see the transient port mapping issues.

Comment on lines +226 to +233
String message = cause.getMessage();
if (message != null && (message.contains("Address already in use") ||
message.contains("is in use by another") ||
message.contains("Failed to bind port")))
{
isBindFailure = true;
break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:
If "Address already in use" and "is in use by another" originate from BindException (OS/JVM), a cause instanceof BindException check inside the loop would be more robust than string matching, which can vary across OS, JVM vendor, or locale. If they come from application-level messages, the string matching is fine as-is.

Suggested change
String message = cause.getMessage();
if (message != null && (message.contains("Address already in use") ||
message.contains("is in use by another") ||
message.contains("Failed to bind port")))
{
isBindFailure = true;
break;
}
String message = cause.getMessage();
if (cause instanceof java.net.BindException || (cause.getMessage() != null && cause.getMessage().contains("Failed to bind port")))
{
isBindFailure = true;
break;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from our friend Server.java#start in Cassandra:

        if (!bindFuture.awaitUninterruptibly().isSuccess())
            throw new IllegalStateException(String.format("Failed to bind port %d on %s.", socket.getPort(), socket.getAddress().getHostAddress()),
                                            bindFuture.cause());

Certainly if that changes we could have a bad time. But the reality is that we're a) throwing an IllegalStateException in C*, and b) it's going through layers of indirection (C* inside in-jvm dtest API inside classloaders) that make it pretty hard to durably determine what exception type is going to show up in the test level since various tiers can catch and rethrow as different types.

Hence the throwing my hands in the air and just strcmp'ing it for now. On the plus side, that string has been stable since at least 2018 so seems stable.

Famous last words.

Copy link
Copy Markdown
Contributor

@arjunashok arjunashok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion. Otherwise, looks good, +1.

You might want to re-run the sole failing integ test, which is unrelated - likely flaky.

JoiningMultiDCFailureTest > testJoiningNodeInMultiDCTest(TestConsistencyLevel) > 4 => readCL=QUORUM, writeCL=QUORUM FAILED
    java.lang.AssertionError: 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants