Only load depsfile if not dirty [Fix #2666]#2680
Only load depsfile if not dirty [Fix #2666]#2680moritzx22 wants to merge 4 commits intoninja-build:masterfrom
Conversation
bb1200c to
f6320d7
Compare
mathstuf
left a comment
There was a problem hiding this comment.
I think the test suite leak fix can be its own PR. Can a test case for the compelling scenario be added?
new PR created: #2684. I will keep this PR unchanged, until PR2684 is merged and a rebase can be done. |
One cycle test for the depfile has been added. A similar test for dyndep is pending. |
e42469e to
e06a6b6
Compare
|
The dyndep issue is none. The dyndep file is already only loaded if it is not dirty. The respective commit has been removed. |
Changes in the recent push
|
ninja -t missingdepsconsole output This essentially means that issues can occur if a depsfile is not loaded because it has not yet been generated. With this merge request, the condition is extended: if the corresponding target is considered dirty due to the manifest, the depsfile will also not be loaded. As a result, this PR may increase the likelihood of build flakiness when missingdeps are present. At the time of writing, I do not see any additional negative impact introduced by the conceptual change in this PR. |
mathstuf
left a comment
There was a problem hiding this comment.
Looks OK to me, but someone else should also review (I did it mainly from the test cases).
|
rebased to master |
e1e20a8 to
d234784
Compare
140ec6c to
8af8118
Compare
|
rebased to master |
|
Rebased to master in previous push. |
digit-google
left a comment
There was a problem hiding this comment.
This PR is impossible to review properly as a stack of 8 commits with what looks like random changes with unclear commit messages, please squash / rebase this into something that is simpler to review. For example:
-
one commit to add the constness changes + the cp-deps test rule implementation.
-
second commit to change the LoadDepXXX() signatures with proper documentation of all new parameters, preferably without changing the implementation yet.
-
a third commit that changes the implementation to change the behavior / fix the bug and modify the tests accordingly.
Each commit should have a clear commit message explaining its purpose and why things are changed in a certain way. I'll add some inline comments too.
src/graph.cc
Outdated
| namespace { | ||
|
|
||
| /// execute hash only once in lifetime of object and only on request | ||
| struct hashCommand { |
There was a problem hiding this comment.
Please follow existing coding conventions, i.e. struct/class names should use PascalCase (hashCommand -> HashCommand), and member variables should use trailing underscore (valid -> valid_). Moreover, call this LazyEdgeCommandHash for clarity.
src/graph.cc
Outdated
|
|
||
| /// class is similar to a pointer of BuildLog::LogEntry | ||
| /// additionally the LookupByOutput is cached for performance reasons | ||
| class LogEntryCache { |
There was a problem hiding this comment.
nit: Name this CachedLogEntry for clarity since this is not a cache.
src/graph.cc
Outdated
| public: | ||
| LogEntryCache(){}; | ||
|
|
||
| operator bool() const { return entry_; } |
There was a problem hiding this comment.
Explain what this corresponds to and when it is safe to call, since it never looks at evaluated_, the meaning of the result value is ambiguous. Consider replacing this with is_valid() for clarity, bool operators can lead to surprising bugs.
There was a problem hiding this comment.
renamed to is_valid()
There was a problem hiding this comment.
example usage for clarity
CachedLogEntry cached;
if(cached.is_valid()) cached->foo(); // nullptr
cached.LookupByOutput(build_log, output); // assign value
if(cached.is_valid()) cached->foo(); // call foo()
cached.LookupByOutput(build_log, output); // already cached
if(cached.is_valid()) cached->foo(); // call foo()
// a raw pointer instead
BuildLog::LogEntry* entry = nullptr;
if(entry) entry->foo(); // nullptr
entry = build_log.LookupByOutput(output);
if(entry.is_valid()) entry->foo(); // call foo()
entry = build_log.LookupByOutput(output); // second call
if(entry.is_valid()) entry->foo(); // call foo()
src/graph.cc
Outdated
| BuildLog::LogEntry* entry_ = nullptr; | ||
| }; | ||
|
|
||
| bool LogEntryCache::LookupByOutput(const BuildLog* buildLog, const Node* output) { |
There was a problem hiding this comment.
This interface is ambiguous, because the function could in theory be called with different |output| values and will only return a result corresponding to the first call. You could implement something similar without a dedicated LogEntryCache class with a simple std::map<const Node*, const BuildLog::LogEntry*> instead inside RecomputeOutputsDirty_, which would be simpler / clearer.
There was a problem hiding this comment.
Here are the reasons behind introducing this new class:
Requirements
- Cache the result of LookupByOutput.
- Store only a single pointer and a single boolean per cache entry.
- Keep the cache in contiguous memory (std::vector) for compiler‑friendly access patterns.
- Allocate the memory only once (vector sized in the constructor).
- Allow lookup of the cached value for a given output at effectively zero cost.
RecomputeOutputsDirty is performance‑critical.
Its worst‑case scenario is a clean build, where no early exits occur and every output must be visited. RecomputeOutputsDirty will be invoked twice in this situation because the depfile is loaded here. This is exactly where the cache provides the most benefit.
To achieve this performance, the class relies on strict usage assumptions:
LookupByOutput must always be called with the same parameters for a given instance. Debug assertions enforce this, and the assumptions are documented in the code. These constraints allow the implementation to remain efficient.
RecomputeOutputsDirtyCache ensures these assumptions hold. It selects the correct output and its associated cache entry for processing. The CachedLogEntry type is defined in the private section to prevent accidental misuse outside the intended context.
A std::map could also be used to implement the cache, and it would likely be simpler to write, but I expect its performance to be worse, especially for clean builds.
Please advise.
src/graph.cc
Outdated
| } | ||
|
|
||
| /// performance optimized to recompute the outputs | ||
| class RecomputeOutputsDirty_ { |
There was a problem hiding this comment.
Please do not use trailing underscores in class names (or even inside them). Call this RecomputeOutputsDirtyCache instead, or something similar. Also consider moving changes related to performance optimizations to their own commit so they can be reviewed more easily.
src/graph.cc
Outdated
| /// performance optimized to recompute the outputs | ||
| class RecomputeOutputsDirty_ { | ||
| public: | ||
| RecomputeOutputsDirty_(BuildLog* buildLog, OptionalExplanations& explanations, |
There was a problem hiding this comment.
coding style: please use snake_case for variable / member identifiers (buildLog -> build_log)
| Edge* edge) | ||
| : buildLog_(buildLog), explanations_(explanations), edge_(edge), | ||
| LogEntry_(edge->outputs_.size()) {} | ||
| bool all(const Node* most_recent_input); |
There was a problem hiding this comment.
nit: document what these methods do.
src/graph.cc
Outdated
| return false; | ||
| } | ||
|
|
||
| // disable warning for windows |
There was a problem hiding this comment.
Explain why these are needed exactly.
There was a problem hiding this comment.
MSVC warns (and errors) on constructs like:
if (false) {
// do some stuff
}
This is suppressed. Anyhow this is obsolete with the change to c++17 and the use of if constexpr
There was a problem hiding this comment.
This should go as a comment inside the source code, so that future maintainers now how / when to keep this.
There was a problem hiding this comment.
sorry for not being clear. In the latest push it looks like
if constexpr (false) {
// do some stuff
}and no warning or error is reported by MSVC anymore. The disable warning stuff has been removed.
| assert(FIRSTRUN || !(cond)); /* NOLINT */ \ | ||
| if (FIRSTRUN && (cond)) /* NOLINT */ | ||
|
|
||
| template <bool FIRSTRUN> |
There was a problem hiding this comment.
I strongly recommend to getting rid of the template parameter, and adding a simple first_run function parameter instead.
There was a problem hiding this comment.
The template parameter is constexpr, which gives the compiler the best opportunity to optimize the code. Conceptually, the template function represents two distinct functions and helps avoid code duplication. The runtime if is replaced with if constexpr in the next push, so the unused branch is removed entirely at compile time.
There are essentially three design options:
- A template function
- Uses a
constexpr template parameterto generate two optimized code paths without duplication. - Cleanly separates the regular function parameters from the compile‑time selection parameter.
- Uses a
- A regular function with a runtime parameter
- Simpler interface
- Two separate functions
- Maximum clarity, but duplicates code.
Please restate your preference.
If you still recommend avoiding the template parameter, I’d appreciate some more detail on why the template approach is undesirable in this context.
Note: The if is changed to if constexpr in the macro.
src/graph.h
Outdated
| // or out of date). | ||
| bool LoadDeps(Edge* edge, std::string* err); | ||
| bool LoadDeps(Edge* edge, std::string* err, | ||
| std::array<std::size_t, 2>* input_range = nullptr); |
There was a problem hiding this comment.
Please update the documentation to explain the purpose of this new input_range parameter. Consider using a simple struct InputRange { size_t start; size_t end; } definition to make this easier to read and understand. Clarify that this is an optional output parameter.
There was a problem hiding this comment.
replaced std::array with:
struct InputView {
std::size_t offset_begin = 0;
std::size_t offset_end = 0;
};
There was a problem hiding this comment.
The original conditional was removed to improve performance.
The Call that previously used that conditional now uses a dummy object.
This path(missingdeps) isn't performance‑critical, so using a dummy is acceptable.
src/graph.cc
Outdated
| return false; | ||
| } | ||
|
|
||
| const auto input_end = edge->inputs_.cend() - input_range[1]; |
There was a problem hiding this comment.
Is this computation correct here? Can you clarify the meaning of input_range[1]? From the name "range" it can be assumed that this would be the position of the first item after the range, but in this case, you would use input_end = edge->inputs_.cbegin() + input_range[1] instead.
If the value is a count instead, "input_span" might be a better name, but the computation would be input_end = edge->inputs_.cbegin() + input_range[0] + input_range[1] so I am puzzled as to what this code does.
src/graph.cc
Outdated
| // Load output mtimes so we can compare them to the most recent input below. | ||
| for (Node* o : edge->outputs_) { | ||
| for (vector<Node*>::iterator o = edge->outputs_.begin(); | ||
| o != edge->outputs_.end(); ++o) { |
There was a problem hiding this comment.
nit: Why regress here when the original code was perfectly fine?
src/graph.cc
Outdated
| // if an rebuild is necessary the deps log is outdated for this target | ||
| if (!edge->deps_loaded_ && !dirty) { | ||
| // This is our first encounter with this edge. Load discovered deps. | ||
| std::array<std::size_t, 2> newLinks{ 0, 0 }; |
There was a problem hiding this comment.
nit: newLinks doesn't mean anything here. Use something more specific here. new_deps maybe?
There was a problem hiding this comment.
Or more precisely new_deps_range
There was a problem hiding this comment.
renamed to new_deps
src/graph.cc
Outdated
| } | ||
|
|
||
| if (input_range) | ||
| (*input_range)[1] = std::distance(implicit_dep, edge->inputs_.end()); |
There was a problem hiding this comment.
oh, so you are storing the distance from the last implicit input to the end of the array. This data structure definitely is not a range. Consider storing the distance from the start for clarity.
There was a problem hiding this comment.
This data structure represents a subset of a container.
The usual approach starting from index 0 has a drawback:
the default‑constructed view should represent the entire container, which requires knowing its size.
In this context it would look like:
if (!RecomputeEdgesInputsDirty(node, InputView(), most_recent_input, dirty,
stack, validation_nodes, err))
// would need to become
if (!RecomputeEdgesInputsDirty(node, InputView{0, node->in_edge()->inputs_.size()}, most_recent_input, dirty,
stack, validation_nodes, err))I can implement the change, and it can certainly be expressed more cleanly than in the example above. However, in this particular context the change makes the code a bit more complicated. Before proceeding, I’d appreciate your advice.
There was a problem hiding this comment.
Second, there is
This data structure represents a subset of a container. The usual approach starting from index 0 has a drawback: the default‑constructed view should represent the entire container, which requires knowing its size.
Technically, this is neither a "view" nor a "subset" as these terms usually refer to objects that can be used directly to access individual items. This is not the case here: you just have a pair of numbers, whose interpretation requires additional information (in this case the exact and unmodified inputs_ array they refer to). Hence using something like "range" in the name makes more sense. Another option is to store the Edge pointer in the data structure (or at least a pointer to its edge->inputs_ array).
If you prefer a non-conventional layout / interpretation for the values, I strongly recommend making a custom class with human-friendly accessors to properly document its purpose and simplify its usage. E.g.
struct EdgeInputsRange {
/// Create new instance covering all |edge| inputs.
EdgeInputsRange(const Edge* edge);
/// Create instance covering the [start_pos..end_pos) interval of |edge| inputs.
EdgeInputsRange(const Edge* edge, size_t start_pos, size_t end_pos);
size_t start_pos() const;
size_t end_pos() const;
private:
...
};
In this context it would look like:
if (!RecomputeEdgesInputsDirty(node, InputView(), most_recent_input, dirty, stack, validation_nodes, err)) // would need to become if (!RecomputeEdgesInputsDirty(node, InputView{0, node->in_edge()->inputs_.size()}, most_recent_input, dirty, stack, validation_nodes, err))I can implement the change, and it can certainly be expressed more cleanly than in the example above. However, in this particular context the change makes the code a bit more complicated. Before proceeding, I’d appreciate your advice.
This change adjusts the internal order of the load output mtimes step and the step that recomputes the dirty state of inputs. The modification does not alter any functional behavior. The reordering improves internal consistency and prepares the code for upcoming changes.
True. I will reorder and clean up the commits so that each one has a clear purpose and is easier to understand. Only the final commit introduces a functional change. All earlier commits are refactoring or cleanup and keep current master behavior. After restructuring the history, this separation will be clearer and the review much simpler. Most of the other comments will be incorporated as well. |
This change refactors internal parts of the code without altering functional behavior. It prepares the implementation for a future update in which dependencies will be loaded only when inputs are not marked dirty. The sequence in which only a subset of inputs can be specified to be processed will matter for upcoming changes to depfile loading. A new helper function 'RecomputeEdgesInputsDirty' has been introduced for clarity. The function can be specified to visit only a subset of an edge’s inputs as well.
Replace DependencyScan::RecomputeOutputDirty with the new RecomputeOutputsDirtyCache helper class to centralize the logic for determining whether edge outputs are dirty. The new cache‑aware implementation avoids redundant work, improves readability, and prepares the codebase for upcoming changes to load depfiles only when nodes are not dirty. Build‑log lookups are now cached, and command hash computation is performed lazily to improve performance. This commit introduces no functional changes.
This change updates the logic so that the depfile is loaded only when no output node is not dirty based on manifest and dyndep inputs (excluding the depfile itself). If the outputs are already scheduled to be regenerated, loading the depfile is unnecessary, since its only purpose is to trigger regeneration of the outputs — which is already guaranteed. Avoiding the use of a potentially outdated depfile prevents incorrect cycle detection and ensures that stale depfiles are no longer added to the graph.
This Merge request is related to #2666.
This pull request proposes to load the depsfile only if it is not dirty.
For a more detailed description, see comment #2666