Skip to content

Fix corruption when unstaging changes made by git commands#1856

Open
tyrielv wants to merge 1 commit intomicrosoft:masterfrom
tyrielv:restore-repro-test
Open

Fix corruption when unstaging changes made by git commands#1856
tyrielv wants to merge 1 commit intomicrosoft:masterfrom
tyrielv:restore-repro-test

Conversation

@tyrielv
Copy link
Contributor

@tyrielv tyrielv commented Aug 14, 2025

Adds a test to reproduce corruption of cherry-pick followed by restore. This should remove all staged changes in the index but leave the working directory unchanged.
It still leaves the working directory unchanged, but the index is in a corrupted state and git status no longer accurately reflects the working directory.

See #1855. It hasn't been determined yet whether microsoft/git or VFSForGit is the cause of the error - it could be both.

@tyrielv tyrielv force-pushed the restore-repro-test branch from e122e0d to ef5f575 Compare March 20, 2026 22:20
@tyrielv tyrielv changed the title Functional test for cherry-pick -> restore corruption Fix corruption when unstaging changes made by git commands Mar 20, 2026
When git commands like cherry-pick -n or merge stage changes directly
(as opposed to user edits followed by git add), those staged files have
skip-worktree set and are not in the GVFS ModifiedPaths database. A
subsequent 'restore --staged' then fails to properly unstage them:
- Modified/deleted files become invisible to git status
- Added files (ProjFS placeholders) vanish on projection changes

Fix by sending a PreUnstage pipe message from the pre-command hook
before 'restore --staged' or 'checkout HEAD --' runs:
1. Query staged files via 'git diff --cached --name-status -z' matching
   the command's pathspec, and add them to ModifiedPaths so git clears
   skip-worktree and detects their working tree state
2. For staged-added files, write their content to disk via batched
   'git checkout-index --force' (with hooks bypassed to avoid deadlock)
   so they persist as full files across projection changes

Pathspecs are forwarded from the hook args to scope the operation,
avoiding unnecessary ModifiedPaths additions during large merge
conflict resolutions. On failure, the hook blocks with an actionable
error message rather than allowing silent corruption.

Components:
- GVFS.Hooks/Program.Unstage.cs: detect unstage operations, extract
  pathspecs, send PreUnstage message, block on failure
- GVFS.Mount/InProcessMount.cs: handle PreUnstage message
- GVFS.Virtualization/FileSystemCallbacks.cs: AddStagedFilesToModifiedPaths
  queries staged files, adds to ModifiedPaths, hydrates added files
- GVFS.Common/Git/GitProcess.cs: DiffCachedNameStatus, batched
  CheckoutIndexForFiles, QuoteGitPath for safe path escaping
- GVFS.Common/NamedPipes/UnstageNamedPipeMessages.cs: message type
- CorruptionReproTests.cs: functional test
- GitProcessTests.cs: QuoteGitPath unit + integration tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tyrielv tyrielv force-pushed the restore-repro-test branch from ef5f575 to 9aa9eda Compare March 20, 2026 22:28
@tyrielv
Copy link
Contributor Author

tyrielv commented Mar 20, 2026

Fix corruption when unstaging changes made by git commands

Problem

When git commands like cherry-pick -n or reset --soft stage changes directly (as opposed to user edits followed by git add), those staged files have skip-worktree set and are not in the GVFS ModifiedPaths database. A subsequent git restore --staged fails to properly unstage them:

  • Modified/deleted files become invisible to git status — the index is reset to match HEAD but git doesn't check the working tree because skip-worktree is set
  • Added files (ProjFS placeholders) vanish when the projection reverts to HEAD since they're not real files on disk

See #1855 for the original issue report.

Solution

Add a PreUnstage named pipe message sent from the pre-command hook before restore --staged or checkout HEAD -- runs:

  1. Query staged files via git diff --cached --name-status -z matching the command's pathspec
  2. Add them to ModifiedPaths so git clears skip-worktree and detects their working tree state
  3. Hydrate added files by writing their content from the git object store to disk via batched git checkout-index --force (with hooks bypassed), converting them from ProjFS placeholders to full files that survive projection changes

Pathspecs from the user's command are forwarded to scope the operation — this avoids adding all staged files to ModifiedPaths when only unstaging specific files (important for performance during large merge conflict resolutions).

Key Design Decisions

  • Why fix in VFSForGit, not in git? We initially explored fixes in microsoft/git: correcting post-index-change hook args, bypassing skip-worktree in restore --staged (builtin/checkout.c), and detecting skip-worktree changes in apply_virtualfilesystem(). These approaches either didn't fully fix the problem (modified files only, not added), introduced the chicken-and-egg problem (files need to be in ModifiedPaths for git to clear skip-worktree, but need skip-worktree cleared to be added to ModifiedPaths), or had performance concerns (scanning 2.8M index entries). The VFS projection mismatch is fundamentally a GVFS concern — GVFS owns the projection and ModifiedPaths, so it's best positioned to prepare them before git operates.

  • Why pre-command hook, not post-command? The fix must run before restore --staged reads the index, because apply_virtualfilesystem() sets skip-worktree bits during index read based on ModifiedPaths content.

  • Why hydrate added files? ProjFS placeholders are projections from the index. When restore --staged removes an added file from the index, the projection no longer includes it, so the placeholder vanishes. Writing the file from the object store makes it a real file that persists.

  • Why git checkout-index instead of git show :path? checkout-index handles .gitattributes, line endings, and file modes correctly. git show outputs raw blob content.

  • Why bypass core.hookspath for checkout-index? The pre-command hook holds the GVFSLock. If checkout-index triggered the pre-command hook, it would try to re-acquire the lock → deadlock.

  • Why block on failure? If the PreUnstage preparation fails, allowing restore --staged to proceed would silently corrupt the index. Better to fail loudly with instructions to remount.

Files Changed

File Purpose
GVFS.Hooks/Program.Unstage.cs New partial class — detect unstage operations, extract pathspecs, send PreUnstage message
GVFS.Hooks/Program.cs Add partial keyword, add restore/checkout cases to RunPreCommands
GVFS.Hooks/GVFS.Hooks.csproj Link UnstageNamedPipeMessages.cs
GVFS.Mount/InProcessMount.cs Handle PreUnstage pipe message
GVFS.Virtualization/FileSystemCallbacks.cs AddStagedFilesToModifiedPaths — query staged files, update ModifiedPaths, hydrate added files
GVFS.Common/Git/GitProcess.cs DiffCachedNameStatus, batched CheckoutIndexForFiles, QuoteGitPath
GVFS.Common/NamedPipes/UnstageNamedPipeMessages.cs New partial class for PreUnstage message type
GVFS.FunctionalTests/.../CorruptionReproTests.cs Functional test: cherry-pick -n → restore --staged (single file + all) → restore working tree
GVFS.UnitTests/Git/GitProcessTests.cs QuoteGitPath unit tests (data-driven) + integration test (round-trip through git process)

Testing

Manual verification

Tested on a GVFS enlistment (ForTests repo) and compared output against a regular git clone at each step. After the fix, VFS and regular git produce identical git status output through the full cherry-pick → restore --staged → restore workflow.

Unit tests

  • QuoteGitPath: 8 data-driven test cases covering simple paths, spaces, embedded quotes, backslashes before quotes, trailing backslashes, empty paths
  • QuoteGitPath_RoundTripThroughProcess: Integration test launching git rev-parse --sq-quote to verify argument quoting survives the ProcessStartInfo → Windows CRT → git round-trip

Functional test

CorruptionReproTests.ReproCherryPickRestoreCorruption:

  1. Cherry-pick staged changes (adds, deletes, modifications)
  2. restore --staged for a single file (verifies pathspec scoping)
  3. restore --staged . for everything remaining
  4. restore -- . to clean working directory
  5. FilesShouldMatchCheckoutOfSourceBranch validates VFS matches control repo

Review Guidance

Areas to pay particular attention to:

  • Program.Unstage.cs arg parsing — Does IsUnstageOperation correctly detect all relevant command patterns? Does GetRestorePathspec correctly extract pathspecs and skip flags?

  • QuoteGitPath in GitProcess.cs — The Windows CRT argument parsing rules for backslash+quote interactions are subtle. The integration test validates the common cases, but review the edge case handling.

  • Error handling in SendPrepareForUnstageMessage — On pipe connection failure, we block the git command. Is this the right tradeoff? (Discussed: if mount isn't running there's no projection to corrupt, but connection failure could also mean the mount crashed.)

  • AddStagedFilesToModifiedPaths parsing of --name-status -z output — Status and path alternate as null-separated fields. --no-renames is used to ensure renames appear as delete+add pairs rather than R\0old\0new\0 which would break the alternating parser. Rename detection is also expensive on large repositories.

  • Performance — With many staged files, we add them all to ModifiedPaths (scoped by pathspec). With many added files, CheckoutIndexForFiles batches them respecting the 32K command line limit. Is this sufficient?

@tyrielv tyrielv marked this pull request as ready for review March 20, 2026 22:48
@tyrielv tyrielv requested a review from KeithIsSleeping March 20, 2026 22:48
Copy link

@KeithIsSleeping KeithIsSleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Model Code Review

Reviewed with Claude Opus 4.6, GPT-5.1-Codex-Max, GPT-5.3-Codex, and GPT-5.4. Findings below are consensus items where multiple models independently identified the same issue.

// Query all staged files in one call using --name-status -z.
// Output format: "A\0path1\0M\0path2\0D\0path3\0"
GitProcess.Result result = gitProcess.DiffCachedNameStatus(pathspecs);
if (result.ExitCodeIsSuccess && !string.IsNullOrEmpty(result.Output))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH: Fail-open — AddStagedFilesToModifiedPaths returns success when git diff --cached fails (3/3 models agree)

success is initialized to true (line 382) and is only set to false inside the ExitCodeIsSuccess block. If DiffCachedNameStatus fails (non-zero exit — index locked, repo not ready, disk error), the method falls through and returns true with addedCount = 0.

The caller in HandlePrepareForUnstageRequest converts this to a success response, the hook allows the unstage to proceed, and no paths were added to ModifiedPaths. This re-opens the exact corruption this PR is meant to fix.

Recommendation: Treat !result.ExitCodeIsSuccess as hard failure:

else if (!result.ExitCodeIsSuccess)
{
    success = false;
    // Log result.Errors for diagnostics
}

string arg = args[i];

if (arg.StartsWith("--git-pid="))
continue;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM: Tree-ish arguments incorrectly forwarded as pathspecs (2/4 models agree)

The parser adds every non-option token before -- to the path list. For git checkout HEAD -- foo.txt, the extracted list becomes ["HEAD", "foo.txt"]. If the repo has a file or directory matching the tree-ish name, the mount side will add/hydrate unrelated staged paths.

At OS repo scale (2.5M files), accidental scope widening is dangerous.

Recommendation: Parse checkout and restore separately — strip the checkout tree-ish (HEAD) and any --source value before collecting pathspecs.

/// Returns null-separated pathspecs, or empty string for all staged files.
/// </summary>
private static string GetRestorePathspec(string command, string[] args)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM: --pathspec-from-file ignored, falls back to all-staged scan (2/4 models agree)

GetRestorePathspec() skips all --prefixed args without interpreting them. git restore --staged --pathspec-from-file=list.txt is treated as no pathspec, which downstream means "all staged files" (DiffCachedNameStatus(null)). On the OS repo, a narrowly-scoped unstage becomes a full-index walk.

Recommendation: Explicitly support --pathspec-from-file/--pathspec-file-nul, or fail closed with an error when those options are detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants