Skip to content

Detect & rebuild externally modified output files#2752

Open
philwo wants to merge 3 commits intoninja-build:masterfrom
philwo:detect-modified-outputs
Open

Detect & rebuild externally modified output files#2752
philwo wants to merge 3 commits intoninja-build:masterfrom
philwo:detect-modified-outputs

Conversation

@philwo
Copy link
Copy Markdown

@philwo philwo commented Mar 24, 2026

Add an output_mtime field to the build log that records the actual mtime of each output file after a successful build. On subsequent builds, if an output's current mtime differs from the recorded value, the output is marked dirty and rebuilt.

Previously, Ninja only stored command_start_time_ in the log's mtime field (to detect inputs modified during a build) and had no record of the output's actual mtime. This meant externally modified outputs (e.g. echo "corrupted" > out.txt) went undetected.

The new field is appended as a 6th tab-separated column in .ninja_log. Old Ninja versions parse it harmlessly (strtoull stops at the tab), so no log version bump is needed. Old log entries default to output_mtime=0, which skips the check. Generator rule outputs are also excluded since they are expected to be user-edited.

philwo added 2 commits March 24, 2026 09:34
Add an `output_mtime` field to the build log that records the actual
mtime of each output file after a successful build. On subsequent
builds, if an output's current mtime differs from the recorded value,
the output is marked dirty and rebuilt.

Previously, Ninja only stored `command_start_time_` in the log's mtime
field (to detect inputs modified during a build) and had no record of
the output's actual mtime. This meant externally modified outputs
(e.g. `echo "corrupted" > out.txt`) went undetected.

The new field is appended as a 6th tab-separated column in .ninja_log.
Old Ninja versions parse it harmlessly (strtoull stops at the tab), so
no log version bump is needed. Old log entries default to
output_mtime=0, which skips the check. Generator rule outputs are also
excluded since they are expected to be user-edited.
Test that outputs modified outside of Ninja are detected and rebuilt,
and that generator outputs are excluded from this check.
Copilot AI review requested due to automatic review settings March 24, 2026 00:50
@philwo
Copy link
Copy Markdown
Author

philwo commented Mar 24, 2026

Hey, for context, I noticed this while analyzing how closely our Siso build tool in Chromium sticks to the behavior of Ninja. During these tests, I noticed that modified outputs aren't detected and rebuilt by Ninja. I'm not sure if you want to change the behavior of Ninja in this regard, but if you also consider this a bug, then this should fix it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-output mtime tracking to Ninja’s build log so that outputs modified externally between builds are detected and rebuilt, while preserving backward compatibility with existing .ninja_log entries.

Changes:

  • Extend .ninja_log format by appending an optional 6th column (output_mtime) and plumb it through BuildLog::{Load,WriteEntry,RecordCommand}.
  • Record actual output mtimes after successful commands and use them during dirty-checking to detect externally modified outputs (excluding generator outputs).
  • Add regression tests covering external output modification detection and generator-output exclusion.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/graph.cc Marks outputs dirty when on-disk mtime differs from the last recorded output_mtime (excluding generator outputs).
src/build.cc Always stats outputs after successful commands to capture per-output mtimes and records them into the build log.
src/build_test.cc Adjusts existing restat test setup and adds tests for externally modified output detection and generator exclusion.
src/build_log.h Extends BuildLog::RecordCommand API and LogEntry to include output_mtime.
src/build_log.cc Writes/loads the optional 6th field and stores per-output output_mtime in log entries; updates restat to maintain it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +910 to 941
// Restat the edge outputs. Always stat output files so that we can
// record their actual mtime in the build log for detecting externally
// modified outputs on subsequent builds.
TimeStamp record_mtime = 0;
vector<TimeStamp> output_mtimes;
if (!config_.dry_run) {
const bool restat = edge->GetBindingBool("restat");
const bool generator = edge->GetBindingBool("generator");
bool node_cleaned = false;
record_mtime = edge->command_start_time_;

// restat and generator rules must restat the outputs after the build
// has finished. if record_mtime == 0, then there was an error while
// attempting to touch/stat the temp file when the edge started and
// we should fall back to recording the outputs' current mtime in the
// log.
if (record_mtime == 0 || restat || generator) {
for (vector<Node*>::iterator o = edge->outputs_.begin();
o != edge->outputs_.end(); ++o) {
TimeStamp new_mtime = disk_interface_->Stat((*o)->path(), err);
if (new_mtime == -1)
return false;
for (vector<Node*>::iterator o = edge->outputs_.begin();
o != edge->outputs_.end(); ++o) {
TimeStamp new_mtime = disk_interface_->Stat((*o)->path(), err);
if (new_mtime == -1)
return false;
output_mtimes.push_back(new_mtime);
// For restat/generator rules or when command_start_time_ couldn't be
// obtained, use the actual output mtime for record_mtime.
if (record_mtime == 0 || restat || generator) {
if (new_mtime > record_mtime)
record_mtime = new_mtime;
if ((*o)->mtime() == new_mtime && restat) {
// The rule command did not change the output. Propagate the clean
// state through the build graph.
// Note that this also applies to nonexistent outputs (mtime == 0).
if (!plan_.CleanNode(&scan_, *o, err))
return false;
node_cleaned = true;
}
}
if ((*o)->mtime() == new_mtime && restat) {
// The rule command did not change the output. Propagate the clean
// state through the build graph.
// Note that this also applies to nonexistent outputs (mtime == 0).
if (!plan_.CleanNode(&scan_, *o, err))
return false;
node_cleaned = true;
}
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FinishCommand now stats every output to populate output_mtimes, but for edges that also write to the deps log (deps_type non-empty) we stat the same outputs again a few lines later when recording deps mtimes. Consider reusing the already-collected output_mtimes values for the deps log record to avoid duplicate filesystem stats (especially noticeable for edges with many outputs).

Copilot uses AI. Check for mistakes.
entry->command_hash = (uint64_t)strtoull(start, NULL, 16);
*end = c;

// Parse command_hash (5th field). Use strtoul with an end pointer so
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Use strtoul" when parsing command_hash, but the implementation uses strtoull. Update the comment to match the actual function so future readers don’t get misled when modifying this parsing logic.

Suggested change
// Parse command_hash (5th field). Use strtoul with an end pointer so
// Parse command_hash (5th field). Use strtoull with an end pointer so

Copilot uses AI. Check for mistakes.
@jhasse
Copy link
Copy Markdown
Collaborator

jhasse commented Mar 24, 2026

It's actually intentional AFAIK. I'm not sure if we aren't opening Pandora's box with this change, @evmar might know.

One use-case for me is to quickly change an output (i.e. an asset file) as a quick and dirty hack to test something - while still being able to recompile other binaries. I wouldn't want ninja to override my changes to the asset output in that case.

@philwo
Copy link
Copy Markdown
Author

philwo commented Mar 24, 2026

Ah, I totally understand how having this possibility can be nice. :) We actually got a feature request from a Chromium developer for Siso for exactly that use case, so we implemented a flag that causes it to temporarily not regenerate modified outputs: http://crrev.com/c/6191399/

Making this an explicit flag ensures that after the quick hack is no longer needed, the build automatically converges to the "correct" output state again once you stop passing the flag. That way there's no risk of forgetting that an output was manually modified, then the next day wondering why the binary doesn't do what the source says it should do. 😁

If that approach were something that you think would work for Ninja, maybe a -d tainted debug mode would be the most "ninja"-way to implement it? (I'm not attached to the "tainted" name, maybe there's a better one.. suggestions welcome.)

Of course, if you or Evan think that the current behavior is working as intended for Ninja, that's also totally fine.

@jhasse
Copy link
Copy Markdown
Collaborator

jhasse commented Mar 25, 2026

There are other cases where Ninja skips some checks (e.g. don't stat all files in some situations, requiring ninja -t restat). This means that a "corrupted" state can become unnoticed - even after this PR. So I'm not sure if we should even pretend? We could think about a mode where Ninja really checks everything it can and the default stays the way it is now?

@evmar
Copy link
Copy Markdown
Collaborator

evmar commented Mar 28, 2026

I can't really say what is "intended" behavior, I don't think I ever really thought this through. I think thinking through the use cases and evaluating against them is probably the best you can do.

I believe the reason for the existing command_start_time_ is for more than detecting modified inputs, it's also for the whole 'restat' behavior. Suppose you have B as an output of build command A. When A is out of date, you run the command, but if the build command doesn't write B, then the next build thinks A is still out of date.

In fact, it is "more correct" (for some meaning of correct that doesn't mean necessarily good or desirable, but rather I guess just more precise) to not record when the build command ran, but rather the exact mtime of B after the build ran, whether B ends up with a time in the past or up to date. Then you can just use that to detect whenever B changes. More recently I did a revisit of Ninja where I made every build depend on the exact mtime of all inputs and outputs. In practice this ends up being pretty sensitive to different projects' expectations of Ninja. I recall one thing it breaks is that meson has a build step where the step overwrites one of its inputs when it runs, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants