Detect & rebuild externally modified output files#2752
Detect & rebuild externally modified output files#2752philwo wants to merge 3 commits intoninja-build:masterfrom
Conversation
Add an `output_mtime` field to the build log that records the actual mtime of each output file after a successful build. On subsequent builds, if an output's current mtime differs from the recorded value, the output is marked dirty and rebuilt. Previously, Ninja only stored `command_start_time_` in the log's mtime field (to detect inputs modified during a build) and had no record of the output's actual mtime. This meant externally modified outputs (e.g. `echo "corrupted" > out.txt`) went undetected. The new field is appended as a 6th tab-separated column in .ninja_log. Old Ninja versions parse it harmlessly (strtoull stops at the tab), so no log version bump is needed. Old log entries default to output_mtime=0, which skips the check. Generator rule outputs are also excluded since they are expected to be user-edited.
Test that outputs modified outside of Ninja are detected and rebuilt, and that generator outputs are excluded from this check.
|
Hey, for context, I noticed this while analyzing how closely our Siso build tool in Chromium sticks to the behavior of Ninja. During these tests, I noticed that modified outputs aren't detected and rebuilt by Ninja. I'm not sure if you want to change the behavior of Ninja in this regard, but if you also consider this a bug, then this should fix it. |
There was a problem hiding this comment.
Pull request overview
Adds per-output mtime tracking to Ninja’s build log so that outputs modified externally between builds are detected and rebuilt, while preserving backward compatibility with existing .ninja_log entries.
Changes:
- Extend
.ninja_logformat by appending an optional 6th column (output_mtime) and plumb it throughBuildLog::{Load,WriteEntry,RecordCommand}. - Record actual output mtimes after successful commands and use them during dirty-checking to detect externally modified outputs (excluding generator outputs).
- Add regression tests covering external output modification detection and generator-output exclusion.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/graph.cc | Marks outputs dirty when on-disk mtime differs from the last recorded output_mtime (excluding generator outputs). |
| src/build.cc | Always stats outputs after successful commands to capture per-output mtimes and records them into the build log. |
| src/build_test.cc | Adjusts existing restat test setup and adds tests for externally modified output detection and generator exclusion. |
| src/build_log.h | Extends BuildLog::RecordCommand API and LogEntry to include output_mtime. |
| src/build_log.cc | Writes/loads the optional 6th field and stores per-output output_mtime in log entries; updates restat to maintain it. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Restat the edge outputs. Always stat output files so that we can | ||
| // record their actual mtime in the build log for detecting externally | ||
| // modified outputs on subsequent builds. | ||
| TimeStamp record_mtime = 0; | ||
| vector<TimeStamp> output_mtimes; | ||
| if (!config_.dry_run) { | ||
| const bool restat = edge->GetBindingBool("restat"); | ||
| const bool generator = edge->GetBindingBool("generator"); | ||
| bool node_cleaned = false; | ||
| record_mtime = edge->command_start_time_; | ||
|
|
||
| // restat and generator rules must restat the outputs after the build | ||
| // has finished. if record_mtime == 0, then there was an error while | ||
| // attempting to touch/stat the temp file when the edge started and | ||
| // we should fall back to recording the outputs' current mtime in the | ||
| // log. | ||
| if (record_mtime == 0 || restat || generator) { | ||
| for (vector<Node*>::iterator o = edge->outputs_.begin(); | ||
| o != edge->outputs_.end(); ++o) { | ||
| TimeStamp new_mtime = disk_interface_->Stat((*o)->path(), err); | ||
| if (new_mtime == -1) | ||
| return false; | ||
| for (vector<Node*>::iterator o = edge->outputs_.begin(); | ||
| o != edge->outputs_.end(); ++o) { | ||
| TimeStamp new_mtime = disk_interface_->Stat((*o)->path(), err); | ||
| if (new_mtime == -1) | ||
| return false; | ||
| output_mtimes.push_back(new_mtime); | ||
| // For restat/generator rules or when command_start_time_ couldn't be | ||
| // obtained, use the actual output mtime for record_mtime. | ||
| if (record_mtime == 0 || restat || generator) { | ||
| if (new_mtime > record_mtime) | ||
| record_mtime = new_mtime; | ||
| if ((*o)->mtime() == new_mtime && restat) { | ||
| // The rule command did not change the output. Propagate the clean | ||
| // state through the build graph. | ||
| // Note that this also applies to nonexistent outputs (mtime == 0). | ||
| if (!plan_.CleanNode(&scan_, *o, err)) | ||
| return false; | ||
| node_cleaned = true; | ||
| } | ||
| } | ||
| if ((*o)->mtime() == new_mtime && restat) { | ||
| // The rule command did not change the output. Propagate the clean | ||
| // state through the build graph. | ||
| // Note that this also applies to nonexistent outputs (mtime == 0). | ||
| if (!plan_.CleanNode(&scan_, *o, err)) | ||
| return false; | ||
| node_cleaned = true; | ||
| } | ||
| } |
There was a problem hiding this comment.
FinishCommand now stats every output to populate output_mtimes, but for edges that also write to the deps log (deps_type non-empty) we stat the same outputs again a few lines later when recording deps mtimes. Consider reusing the already-collected output_mtimes values for the deps log record to avoid duplicate filesystem stats (especially noticeable for edges with many outputs).
| entry->command_hash = (uint64_t)strtoull(start, NULL, 16); | ||
| *end = c; | ||
|
|
||
| // Parse command_hash (5th field). Use strtoul with an end pointer so |
There was a problem hiding this comment.
The comment says "Use strtoul" when parsing command_hash, but the implementation uses strtoull. Update the comment to match the actual function so future readers don’t get misled when modifying this parsing logic.
| // Parse command_hash (5th field). Use strtoul with an end pointer so | |
| // Parse command_hash (5th field). Use strtoull with an end pointer so |
|
It's actually intentional AFAIK. I'm not sure if we aren't opening Pandora's box with this change, @evmar might know. One use-case for me is to quickly change an output (i.e. an asset file) as a quick and dirty hack to test something - while still being able to recompile other binaries. I wouldn't want ninja to override my changes to the asset output in that case. |
|
Ah, I totally understand how having this possibility can be nice. :) We actually got a feature request from a Chromium developer for Siso for exactly that use case, so we implemented a flag that causes it to temporarily not regenerate modified outputs: http://crrev.com/c/6191399/ Making this an explicit flag ensures that after the quick hack is no longer needed, the build automatically converges to the "correct" output state again once you stop passing the flag. That way there's no risk of forgetting that an output was manually modified, then the next day wondering why the binary doesn't do what the source says it should do. 😁 If that approach were something that you think would work for Ninja, maybe a Of course, if you or Evan think that the current behavior is working as intended for Ninja, that's also totally fine. |
|
There are other cases where Ninja skips some checks (e.g. don't stat all files in some situations, requiring |
|
I can't really say what is "intended" behavior, I don't think I ever really thought this through. I think thinking through the use cases and evaluating against them is probably the best you can do. I believe the reason for the existing In fact, it is "more correct" (for some meaning of correct that doesn't mean necessarily good or desirable, but rather I guess just more precise) to not record when the build command ran, but rather the exact mtime of B after the build ran, whether B ends up with a time in the past or up to date. Then you can just use that to detect whenever B changes. More recently I did a revisit of Ninja where I made every build depend on the exact mtime of all inputs and outputs. In practice this ends up being pretty sensitive to different projects' expectations of Ninja. I recall one thing it breaks is that meson has a build step where the step overwrites one of its inputs when it runs, for example. |
Add an
output_mtimefield to the build log that records the actual mtime of each output file after a successful build. On subsequent builds, if an output's current mtime differs from the recorded value, the output is marked dirty and rebuilt.Previously, Ninja only stored
command_start_time_in the log's mtime field (to detect inputs modified during a build) and had no record of the output's actual mtime. This meant externally modified outputs (e.g.echo "corrupted" > out.txt) went undetected.The new field is appended as a 6th tab-separated column in .ninja_log. Old Ninja versions parse it harmlessly (strtoull stops at the tab), so no log version bump is needed. Old log entries default to output_mtime=0, which skips the check. Generator rule outputs are also excluded since they are expected to be user-edited.