Conversation
|
dwr-zroy
left a comment
There was a problem hiding this comment.
Big picture:
- I need more info on the Tolerant mode before I can approve. It seems to be out of scope of the observability topic of this PR. Let's chat about this one offline.
- There are changes to the
CbcSolverclass, which also seems out of scope for this PR as well. I might be missing something here. - Misc changes requested about the use of
MDCand the use ofINFOlevel. - Thoughts on where the
RunContextclass should be initialized. - White-space changes are causing many issues with the sonar code quality tool. We will need to make sure we don't change lines of code (even with whitespace or formatting only) unless we are willing to address those sonar issues. This might mean some annoying line-by-line reverts for this PR.
- Related to the
MDCand white-space notes above: I am not sureMDCis valuable enough to justify the cleanup needed. We can talk more about this comment too.
| logger.atInfo() | ||
| .setMessage("Performance Statistics Summary:") | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" Total problems solved: {}") | ||
| .addArgument(totalProblemsSolved) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" Total solve time: {} ms") | ||
| .addArgument(totalSolverTime) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" Total model creation time: {} ms") | ||
| .addArgument(totalModelCreationTime) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" Total constraint setup time: {} ms") | ||
| .addArgument(totalConstraintSetupTime) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" Total variable setup time: {} ms") | ||
| .addArgument(totalVariableSetupTime) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" Average solve time: {} ms") | ||
| .addArgument(totalSolverTime / totalProblemsSolved) | ||
| .log(); |
There was a problem hiding this comment.
I prefer these as a single logging call, easier to read, fewer lines. Also this message does not currently appear in the WRIMS 2 console during a run, so it should not be at the info level.
The comment about the use of INFO applies to many places below, not all of them are called out. Review the current WRIMS 2 console messages and try to limit additional logs at that INFO level.
| logger.atInfo() | |
| .setMessage("Performance Statistics Summary:") | |
| .log(); | |
| logger.atInfo() | |
| .setMessage(" Total problems solved: {}") | |
| .addArgument(totalProblemsSolved) | |
| .log(); | |
| logger.atInfo() | |
| .setMessage(" Total solve time: {} ms") | |
| .addArgument(totalSolverTime) | |
| .log(); | |
| logger.atInfo() | |
| .setMessage(" Total model creation time: {} ms") | |
| .addArgument(totalModelCreationTime) | |
| .log(); | |
| logger.atInfo() | |
| .setMessage(" Total constraint setup time: {} ms") | |
| .addArgument(totalConstraintSetupTime) | |
| .log(); | |
| logger.atInfo() | |
| .setMessage(" Total variable setup time: {} ms") | |
| .addArgument(totalVariableSetupTime) | |
| .log(); | |
| logger.atInfo() | |
| .setMessage(" Average solve time: {} ms") | |
| .addArgument(totalSolverTime / totalProblemsSolved) | |
| .log(); | |
| logger.atDebug() | |
| .setMessage(""" | |
| Performance Statistics Summary: | |
| Total problems solved: {} | |
| Total solve time: {} ms | |
| Total model creation time: {} ms | |
| Total constraint setup time: {} ms | |
| Total variable setup time: {} ms | |
| Average solve time: {} ms""") | |
| .addArgument(totalProblemsSolved) | |
| .addArgument(totalSolverTime) | |
| .addArgument(totalModelCreationTime) | |
| .addArgument(totalConstraintSetupTime) | |
| .addArgument(totalVariableSetupTime) | |
| .addArgument(totalSolverTime / totalProblemsSolved) | |
| .log(); |
|
|
||
| private static void putSolverMdc() { | ||
| try { | ||
| MDC.put("solver", "cbc"); | ||
| } catch (Exception ignore) { | ||
| } | ||
| } | ||
|
|
||
| private static void clearSolverMdc() { | ||
| try { | ||
| MDC.remove("solver"); | ||
| } catch (Exception ignore) { | ||
| } | ||
| } |
There was a problem hiding this comment.
Not sure why we are using MDC when the contexts that are added are entirely local to this class?
|
|
||
| // record lp | ||
| public static double record_if_obj_diff = 10000.0; | ||
| public static double log_if_obj_diff = 500.0; |
There was a problem hiding this comment.
Lines 140 - 259 detected as changes in git, because of white-space changes. If code formatter was applied to the whole file that could cause changes in white-space characters. Make sure to configure your code formatter to only apply to VCS changes.
Since changes were made, we fail the sonar check because of unused variables here. You could either:
- revert these changes,
- or do the cleanup.
| logger.atInfo() | ||
| .setMessage("CBC Solver Initialization Configuration:") | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" cbcViolationCheck = {}") | ||
| .addArgument(cbcViolationCheck) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" lowerBoundZero_check = {}") | ||
| .addArgument(lowerBoundZero_check) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" cbcSolutionRounding = {}") | ||
| .addArgument(cbcSolutionRounding) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" usejCbc2021 = {}") | ||
| .addArgument(usejCbc2021) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" usejCbc2021a = {}") | ||
| .addArgument(usejCbc2021a) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" jCbc Version: {}") | ||
| .addArgument(cbcVersion) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" whsScaling = {}") | ||
| .addArgument(whsScaling) | ||
| .log(); | ||
| logger.atInfo() | ||
| .setMessage(" whsSafe = {}") | ||
| .addArgument(whsSafe) | ||
| .log(); |
There was a problem hiding this comment.
Similar comment here about combining the logging calls, and the use of the INFO level
|
|
||
| public static void newProblem(){ | ||
| putSolverMdc(); | ||
| try { |
There was a problem hiding this comment.
Adding this try block then marks the entire function as "new code", and we would need to do massive refactoring to not fail the sonar check. If we do not use MDC, we do not need the try-catch block.
| private static void putMdc(RunContext rc) { | ||
| try { | ||
| MDC.put("run_id", rc.runId); | ||
| MDC.put("cycleName", rc.cycleName); | ||
| MDC.put("timeStep", rc.timeStep); | ||
| MDC.put("solver", "XA"); | ||
| MDC.put("mode", rc.modeName()); | ||
| } catch (Exception ignore) { | ||
| } | ||
| } |
There was a problem hiding this comment.
Since this is only used in one location, I prefer to just move it into the init
| private static final Logger LOG_MAIN = | ||
| LoggerFactory.getLogger("gov.ca.water.wrims.engine.core.solver.xa.main"); |
There was a problem hiding this comment.
I do not prefer this. These logs will still be caught by the root logger. So I do not see an advantage here of not using
| private static final Logger LOG_MAIN = | |
| LoggerFactory.getLogger("gov.ca.water.wrims.engine.core.solver.xa.main"); | |
| private static final Logger LOG_MAIN = | |
| LoggerFactory.getLogger(XASolver.class); |
| private enum Mode { | ||
| LEGACY, | ||
| TOLERANT | ||
| } |
There was a problem hiding this comment.
I need more of an explanation of the tolerant mode. I do not fully understand how it relates to the logging scope of this PR, it seems like it changes behavior more than just in observability.
| private static final class RunContext { | ||
| final Mode mode; | ||
| final int cycleIndex1Based; | ||
| final String dateStr; | ||
| final String cycleName; | ||
| final String timeStep; | ||
| final String runId; | ||
|
|
||
| RunContext(Mode mode) { | ||
| this.mode = mode; | ||
| this.cycleIndex1Based = ControlData.currCycleIndex + 1; | ||
| this.dateStr = ControlData.currMonth + "/" + ControlData.currDay + "/" + ControlData.currYear; | ||
| this.cycleName = ControlData.currCycleName; | ||
| this.timeStep = String.valueOf(ControlData.timeStep); | ||
| this.runId = UUID.randomUUID().toString(); | ||
| } | ||
|
|
||
| String modeName() { | ||
| return mode == null ? "legacy" : mode.name().toLowerCase(); | ||
| } | ||
|
|
||
| String ctx() { | ||
| return "date=" + dateStr | ||
| + " cycleIndex=" + cycleIndex1Based | ||
| + " cycleName=" + cycleName | ||
| + " timeStep=" + timeStep | ||
| + " run_id=" + runId; | ||
| } | ||
| } |
There was a problem hiding this comment.
I would prefer this being it's own package class, and adding a factory method onto the ControlData class to create instances of this class. Right now, this implementation depends on the global state of ControlData which I would like to try and eliminate.
Let me know if this does not make sense.
| * It aggregates missing/skipped items and keeps only a few samples, | ||
| * avoiding heavy per-item logging in high-frequency loops. | ||
| */ | ||
| private static final class BuildStats { |
There was a problem hiding this comment.
Prefer a more descriptive name here. Maybe
| private static final class BuildStats { | |
| private static final class TrackedWreslCycleCharateristics { |




Summary
This PR refactors
XASolverto improve observability and runtime diagnostics, and introduces an optional tolerant build mode while preserving legacy behavior by default.Main changes
run_id,cycleName,timeStep,solver, andmodePerformanceTimertiming for major XA stages:LEGACYandTOLERANTexecution modesBuildStatsBehavior notes
LEGACYSuggested runtime flags
-Dwrims.xa.mode=tolerantor
-Dwrims.xa.tolerant=true