Skip to content

Only load depsfile if not dirty [Fix #2666]#2680

Open
moritzx22 wants to merge 4 commits intoninja-build:masterfrom
moritzx22:fix2666
Open

Only load depsfile if not dirty [Fix #2666]#2680
moritzx22 wants to merge 4 commits intoninja-build:masterfrom
moritzx22:fix2666

Conversation

@moritzx22
Copy link
Copy Markdown
Contributor

@moritzx22 moritzx22 commented Oct 12, 2025

This Merge request is related to #2666.

This pull request proposes to load the depsfile only if it is not dirty.
For a more detailed description, see comment #2666

@moritzx22
Copy link
Copy Markdown
Contributor Author

moritzx22 commented Oct 12, 2025

running the example from #2666 with the proposed solution, reports no cycle and builds as expected

...
// change the cpp files
$ ninja
[6/6] Linking CXX static library libhasmodules.a
$ ninja
ninja: no work to do.

the solution does work for the build from #2666 but it is still a draft.

@moritzx22 moritzx22 marked this pull request as draft October 12, 2025 17:24
@moritzx22 moritzx22 force-pushed the fix2666 branch 2 times, most recently from bb1200c to f6320d7 Compare October 19, 2025 12:21
Copy link
Copy Markdown
Contributor

@mathstuf mathstuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test suite leak fix can be its own PR. Can a test case for the compelling scenario be added?

@moritzx22
Copy link
Copy Markdown
Contributor Author

moritzx22 commented Oct 21, 2025

think the test suite leak fix can be its own PR

new PR created: #2684. I will keep this PR unchanged, until PR2684 is merged and a rebase can be done.

@moritzx22
Copy link
Copy Markdown
Contributor Author

Can a test case for the compelling scenario be added?

One cycle test for the depfile has been added. A similar test for dyndep is pending.
Numerous tests still fail because this PR does change some basic rules, ninja is designed to and this is reflected in the test suite.

@moritzx22 moritzx22 force-pushed the fix2666 branch 2 times, most recently from e42469e to e06a6b6 Compare October 25, 2025 16:49
@moritzx22
Copy link
Copy Markdown
Contributor Author

The dyndep issue is none. The dyndep file is already only loaded if it is not dirty. The respective commit has been removed.

@moritzx22 moritzx22 changed the title Only load deps and dyndeps if not dirty [Fix #2666] Only load depsfile if not dirty [Fix #2666] Oct 25, 2025
@moritzx22
Copy link
Copy Markdown
Contributor Author

Changes in the recent push

  • more mature implementation
    • restat functionality is corrected
    • dependencies are only checked once
    • runtime performance optimization
  • unit tests have been changed to comply with the new depsfile loading
    • unit tests runs without failing in Linux
    • one Windows only test does still fail
  • tests with builds like llvm reported the expected behavior

@moritzx22
Copy link
Copy Markdown
Contributor Author

moritzx22 commented Nov 22, 2025

ninja -t missingdeps

console output

$ninja -t missingdeps
... There might be build flakiness if any of the targets listed above
are built alone, or not late enough, in a clean output directory.

This essentially means that issues can occur if a depsfile is not loaded because it has not yet been generated. With this merge request, the condition is extended: if the corresponding target is considered dirty due to the manifest, the depsfile will also not be loaded.

As a result, this PR may increase the likelihood of build flakiness when missingdeps are present.
Note: well‑designed Ninja builds should not produce missing dependencies.

At the time of writing, I do not see any additional negative impact introduced by the conceptual change in this PR.

@moritzx22 moritzx22 marked this pull request as ready for review November 22, 2025 18:21
@moritzx22 moritzx22 requested a review from mathstuf November 22, 2025 18:22
Copy link
Copy Markdown
Contributor

@mathstuf mathstuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, but someone else should also review (I did it mainly from the test cases).

@moritzx22
Copy link
Copy Markdown
Contributor Author

rebased to master

@moritzx22 moritzx22 force-pushed the fix2666 branch 5 times, most recently from e1e20a8 to d234784 Compare December 29, 2025 18:51
@moritzx22 moritzx22 force-pushed the fix2666 branch 4 times, most recently from 140ec6c to 8af8118 Compare January 1, 2026 14:51
@moritzx22
Copy link
Copy Markdown
Contributor Author

rebased to master

@moritzx22
Copy link
Copy Markdown
Contributor Author

Rebased to master in previous push.

Copy link
Copy Markdown
Contributor

@digit-google digit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is impossible to review properly as a stack of 8 commits with what looks like random changes with unclear commit messages, please squash / rebase this into something that is simpler to review. For example:

  • one commit to add the constness changes + the cp-deps test rule implementation.

  • second commit to change the LoadDepXXX() signatures with proper documentation of all new parameters, preferably without changing the implementation yet.

  • a third commit that changes the implementation to change the behavior / fix the bug and modify the tests accordingly.

Each commit should have a clear commit message explaining its purpose and why things are changed in a certain way. I'll add some inline comments too.

src/graph.cc Outdated
namespace {

/// execute hash only once in lifetime of object and only on request
struct hashCommand {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow existing coding conventions, i.e. struct/class names should use PascalCase (hashCommand -> HashCommand), and member variables should use trailing underscore (valid -> valid_). Moreover, call this LazyEdgeCommandHash for clarity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/graph.cc Outdated

/// class is similar to a pointer of BuildLog::LogEntry
/// additionally the LookupByOutput is cached for performance reasons
class LogEntryCache {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Name this CachedLogEntry for clarity since this is not a cache.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/graph.cc Outdated
public:
LogEntryCache(){};

operator bool() const { return entry_; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what this corresponds to and when it is safe to call, since it never looks at evaluated_, the meaning of the result value is ambiguous. Consider replacing this with is_valid() for clarity, bool operators can lead to surprising bugs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to is_valid()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example usage for clarity

CachedLogEntry cached;
if(cached.is_valid()) cached->foo(); // nullptr
cached.LookupByOutput(build_log, output);  // assign value
if(cached.is_valid()) cached->foo(); // call foo()
cached.LookupByOutput(build_log, output);  // already cached
if(cached.is_valid()) cached->foo(); // call foo()

// a raw pointer instead
BuildLog::LogEntry* entry = nullptr;
if(entry) entry->foo(); // nullptr
entry = build_log.LookupByOutput(output);
if(entry.is_valid()) entry->foo(); // call foo()
entry = build_log.LookupByOutput(output); // second call
if(entry.is_valid()) entry->foo(); // call foo()

src/graph.cc Outdated
BuildLog::LogEntry* entry_ = nullptr;
};

bool LogEntryCache::LookupByOutput(const BuildLog* buildLog, const Node* output) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is ambiguous, because the function could in theory be called with different |output| values and will only return a result corresponding to the first call. You could implement something similar without a dedicated LogEntryCache class with a simple std::map<const Node*, const BuildLog::LogEntry*> instead inside RecomputeOutputsDirty_, which would be simpler / clearer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the reasons behind introducing this new class:

Requirements

  • Cache the result of LookupByOutput.
  • Store only a single pointer and a single boolean per cache entry.
  • Keep the cache in contiguous memory (std::vector) for compiler‑friendly access patterns.
  • Allocate the memory only once (vector sized in the constructor).
  • Allow lookup of the cached value for a given output at effectively zero cost.

RecomputeOutputsDirty is performance‑critical.
Its worst‑case scenario is a clean build, where no early exits occur and every output must be visited. RecomputeOutputsDirty will be invoked twice in this situation because the depfile is loaded here. This is exactly where the cache provides the most benefit.

To achieve this performance, the class relies on strict usage assumptions:
LookupByOutput must always be called with the same parameters for a given instance. Debug assertions enforce this, and the assumptions are documented in the code. These constraints allow the implementation to remain efficient.

RecomputeOutputsDirtyCache ensures these assumptions hold. It selects the correct output and its associated cache entry for processing. The CachedLogEntry type is defined in the private section to prevent accidental misuse outside the intended context.

A std::map could also be used to implement the cache, and it would likely be simpler to write, but I expect its performance to be worse, especially for clean builds.

Please advise.

src/graph.cc Outdated
}

/// performance optimized to recompute the outputs
class RecomputeOutputsDirty_ {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use trailing underscores in class names (or even inside them). Call this RecomputeOutputsDirtyCache instead, or something similar. Also consider moving changes related to performance optimizations to their own commit so they can be reviewed more easily.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/graph.cc Outdated
/// performance optimized to recompute the outputs
class RecomputeOutputsDirty_ {
public:
RecomputeOutputsDirty_(BuildLog* buildLog, OptionalExplanations& explanations,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style: please use snake_case for variable / member identifiers (buildLog -> build_log)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Edge* edge)
: buildLog_(buildLog), explanations_(explanations), edge_(edge),
LogEntry_(edge->outputs_.size()) {}
bool all(const Node* most_recent_input);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document what these methods do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/graph.cc Outdated
return false;
}

// disable warning for windows
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why these are needed exactly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC warns (and errors) on constructs like:

if (false) {
// do some stuff
}

This is suppressed. Anyhow this is obsolete with the change to c++17 and the use of if constexpr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go as a comment inside the source code, so that future maintainers now how / when to keep this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for not being clear. In the latest push it looks like

if constexpr (false) {
// do some stuff
}

and no warning or error is reported by MSVC anymore. The disable warning stuff has been removed.

assert(FIRSTRUN || !(cond)); /* NOLINT */ \
if (FIRSTRUN && (cond)) /* NOLINT */

template <bool FIRSTRUN>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend to getting rid of the template parameter, and adding a simple first_run function parameter instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template parameter is constexpr, which gives the compiler the best opportunity to optimize the code. Conceptually, the template function represents two distinct functions and helps avoid code duplication. The runtime if is replaced with if constexpr in the next push, so the unused branch is removed entirely at compile time.

There are essentially three design options:

  • A template function
    • Uses a constexpr template parameter to generate two optimized code paths without duplication.
    • Cleanly separates the regular function parameters from the compile‑time selection parameter.
  • A regular function with a runtime parameter
    • Simpler interface
  • Two separate functions
    • Maximum clarity, but duplicates code.

Please restate your preference.
If you still recommend avoiding the template parameter, I’d appreciate some more detail on why the template approach is undesirable in this context.

Note: The if is changed to if constexpr in the macro.

src/graph.h Outdated
// or out of date).
bool LoadDeps(Edge* edge, std::string* err);
bool LoadDeps(Edge* edge, std::string* err,
std::array<std::size_t, 2>* input_range = nullptr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation to explain the purpose of this new input_range parameter. Consider using a simple struct InputRange { size_t start; size_t end; } definition to make this easier to read and understand. Clarify that this is an optional output parameter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced std::array with:

struct InputView {
  std::size_t offset_begin = 0;
  std::size_t offset_end = 0;
};

Copy link
Copy Markdown
Contributor Author

@moritzx22 moritzx22 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original conditional was removed to improve performance.
The Call that previously used that conditional now uses a dummy object.
This path(missingdeps) isn't performance‑critical, so using a dummy is acceptable.

src/graph.cc Outdated
return false;
}

const auto input_end = edge->inputs_.cend() - input_range[1];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this computation correct here? Can you clarify the meaning of input_range[1]? From the name "range" it can be assumed that this would be the position of the first item after the range, but in this case, you would use input_end = edge->inputs_.cbegin() + input_range[1] instead.

If the value is a count instead, "input_span" might be a better name, but the computation would be input_end = edge->inputs_.cbegin() + input_range[0] + input_range[1] so I am puzzled as to what this code does.

src/graph.cc Outdated
// Load output mtimes so we can compare them to the most recent input below.
for (Node* o : edge->outputs_) {
for (vector<Node*>::iterator o = edge->outputs_.begin();
o != edge->outputs_.end(); ++o) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why regress here when the original code was perfectly fine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

src/graph.cc Outdated
// if an rebuild is necessary the deps log is outdated for this target
if (!edge->deps_loaded_ && !dirty) {
// This is our first encounter with this edge. Load discovered deps.
std::array<std::size_t, 2> newLinks{ 0, 0 };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newLinks doesn't mean anything here. Use something more specific here. new_deps maybe?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more precisely new_deps_range

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to new_deps

src/graph.cc Outdated
}

if (input_range)
(*input_range)[1] = std::distance(implicit_dep, edge->inputs_.end());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, so you are storing the distance from the last implicit input to the end of the array. This data structure definitely is not a range. Consider storing the distance from the start for clarity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data structure represents a subset of a container.
The usual approach starting from index 0 has a drawback:
the default‑constructed view should represent the entire container, which requires knowing its size.

In this context it would look like:

if (!RecomputeEdgesInputsDirty(node, InputView(), most_recent_input, dirty,
                                 stack, validation_nodes, err))
// would need to become
if (!RecomputeEdgesInputsDirty(node, InputView{0, node->in_edge()->inputs_.size()}, most_recent_input, dirty,
                                 stack, validation_nodes, err))

graph.cc#L482

I can implement the change, and it can certainly be expressed more cleanly than in the example above. However, in this particular context the change makes the code a bit more complicated. Before proceeding, I’d appreciate your advice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, there is

This data structure represents a subset of a container. The usual approach starting from index 0 has a drawback: the default‑constructed view should represent the entire container, which requires knowing its size.

Technically, this is neither a "view" nor a "subset" as these terms usually refer to objects that can be used directly to access individual items. This is not the case here: you just have a pair of numbers, whose interpretation requires additional information (in this case the exact and unmodified inputs_ array they refer to). Hence using something like "range" in the name makes more sense. Another option is to store the Edge pointer in the data structure (or at least a pointer to its edge->inputs_ array).

If you prefer a non-conventional layout / interpretation for the values, I strongly recommend making a custom class with human-friendly accessors to properly document its purpose and simplify its usage. E.g.

struct EdgeInputsRange {
  /// Create new instance covering all |edge| inputs.
  EdgeInputsRange(const Edge* edge);
  
  /// Create instance covering the [start_pos..end_pos) interval of |edge| inputs.
  EdgeInputsRange(const Edge* edge, size_t start_pos, size_t end_pos);

  size_t start_pos() const;
  size_t end_pos() const;

private:
  ...
};

In this context it would look like:

if (!RecomputeEdgesInputsDirty(node, InputView(), most_recent_input, dirty,
                                 stack, validation_nodes, err))
// would need to become
if (!RecomputeEdgesInputsDirty(node, InputView{0, node->in_edge()->inputs_.size()}, most_recent_input, dirty,
                                 stack, validation_nodes, err))

graph.cc#L482

I can implement the change, and it can certainly be expressed more cleanly than in the example above. However, in this particular context the change makes the code a bit more complicated. Before proceeding, I’d appreciate your advice.

This change adjusts the internal order of the load output mtimes step and the
step that recomputes the dirty state of inputs. The modification does not alter
any functional behavior. The reordering improves internal consistency and
prepares the code for upcoming changes.
@moritzx22
Copy link
Copy Markdown
Contributor Author

This PR is impossible to review properly as a stack of 8 commits with what looks like random changes with unclear commit messages, please squash / rebase this into something that is simpler to review. ...

True. I will reorder and clean up the commits so that each one has a clear purpose and is easier to understand. Only the final commit introduces a functional change. All earlier commits are refactoring or cleanup and keep current master behavior. After restructuring the history, this separation will be clearer and the review much simpler. Most of the other comments will be incorporated as well.

This change refactors internal parts of the code without altering functional
behavior. It prepares the implementation for a future update in which
dependencies will be loaded only when inputs are not marked dirty.

The sequence in which only a subset of inputs can be specified to be processed
will matter for upcoming changes to depfile loading. A new helper function
'RecomputeEdgesInputsDirty' has been introduced for clarity.
The function can be specified to visit only a subset of an edge’s inputs as
well.
Replace DependencyScan::RecomputeOutputDirty with the new
RecomputeOutputsDirtyCache helper class to centralize the
logic for determining whether edge outputs are dirty.
The new cache‑aware implementation avoids redundant work,
improves readability, and prepares the codebase for upcoming
changes to load depfiles only when nodes are not dirty.

Build‑log lookups are now cached, and command hash computation
is performed lazily to improve performance.

This commit introduces no functional changes.
This change updates the logic so that the depfile is loaded only when no
output node is not dirty based on manifest and dyndep inputs (excluding the
depfile itself). If the outputs are already scheduled to be regenerated, loading
the depfile is unnecessary, since its only purpose is to trigger regeneration
of the outputs — which is already guaranteed.

Avoiding the use of a potentially outdated depfile prevents incorrect cycle
detection and ensures that stale depfiles are no longer added to the graph.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants