Trunk is a functional implementation of the Git core protocol, written in Go. It interacts directly with the standard .git directory structure, allowing it to inspect, create, and modify repositories that are fully compatible with the official Git client.
This project demonstrates the internal architecture of version control systems, specifically focusing on Content-Addressable Storage (CAS), Merkle Trees, and Directed Acyclic Graph (DAG) history management.
Trunk operates on the fundamental principle that a version control system is a key-value database, not a diff engine. The system is composed of three primary layers:
At its core, Trunk is a database where every object is identified by the SHA-1 hash of its contents. This allows for automatic deduplication. If two files in different directories contain the exact same text, Trunk stores only one "Blob" object. The filename is irrelevant at this storage layer.
The Index (.git/index) acts as a transitional barrier between the working directory (local files) and the object database. It is a binary file containing a sorted list of file paths, metadata (permissions, timestamps), and the SHA-1 hash of the file content. It represents the "proposed" state of the next snapshot.
To represent directories, Trunk uses "Tree" objects. A Tree is a directory listing that maps filenames to hash IDs.
- Merkle Tree Property: A Tree object contains the hashes of the files inside it. If a file changes, its hash changes. This causes the Tree's hash to change. This bubbles up to the Root Tree. Therefore, the Root Tree hash uniquely identifies the entire state of the project down to the last byte.
Trunk manages four distinct object types stored in .git/objects:
- Purpose: Stores raw file content.
- Format:
blob <size>\x00<content> - Compression: Zlib.
- Purpose: Represents a directory. Stores a list of Blobs and other Trees (subdirectories).
- Format:
tree <size>\x00<mode> <name>\x00<binary_hash>... - Logic: Trees are constructed recursively from the bottom up.
- Purpose: Snapshots a specific Tree in time and provides context.
- Format:
tree <tree_hash>
parent <parent_hash> (Optional)
author <name> <timestamp>
committer <name> <timestamp>
<message>
- Logic: Commits form a Linked List (or DAG) pointing backwards in history.
- Purpose: Human-readable pointers to specific commit hashes.
- Location:
.git/refs/heads/master - HEAD: A symbolic reference pointing to the current active branch (e.g.,
ref: refs/heads/master).
The Trunk binary exposes several subcommands categorized into low-level manipulation and high-level user commands.
These commands manipulate the internal database directly.
hash-object <file>: Computes the SHA-1 hash of a file, compresses it, and stores it as a Blob in the object database.cat-file -p <hash>: Decompresses and prints the content of an object identified by its hash.update-index <file>: Adds a file to the Staging Area. This parses the existing binary index, inserts or updates the entry, sorts the index alphabetically, and writes the binary format back to disk.write-tree: Recursively transforms the flat Index list into a nested Tree structure. It writes the resulting Tree objects to the database and returns the hash of the Root Tree.commit-tree <tree-hash> -m <msg> [-p <parent>]: Creates a Commit object wrapper around a Tree. It requires a message and optionally accepts a parent commit hash to maintain history continuity.read-tree <hash>: Reads a tree object (diagnostic use).
These commands automate the workflow for the end-user.
init: Initializes the repository structure (.gitfolder,objects,refs).log: Traverses the commit history starting from HEAD, following parent pointers, and displaying metadata.commit -m <msg>: Automates the snapshot process. It determines the current parent, writes the tree, creates the commit, and updates the branch reference.
There are two methods to persist changes using Trunk.
This method exposes the internal pipeline of Git. It requires the user to manually pass hash outputs from one step to the next.
- Stage the File: Compute the hash and update the index binary.
go run . update-index filename.txt
- Generate the Tree: Create the directory structure objects and obtain the Root Tree Hash.
go run . write-tree
# Output Example: 9618621b128ce3b485d3c204a21623a400e83bff
- Create the Commit Object: Manually link the new Tree to the previous Commit (Parent). You must know the previous commit hash (if one exists).
go run . commit-tree <TREE_HASH_FROM_STEP_2> -p <PREVIOUS_COMMIT_HASH> -m "Commit Message"
# Output Example: fa814d70f34d84094c2ec0de68e21e9f34e173fd
- Update the Reference: Manually update the branch pointer to the new commit.
echo <COMMIT_HASH_FROM_STEP_3> > .git/refs/heads/master
This method utilizes the high-level commit command to handle tree generation, parent resolution, and reference updates automatically.
- Stage the File: Add modified files to the index.
go run . update-index filename.txt
- Commit: Run the commit command. Trunk will automatically:
- Read
.git/HEADto resolve the current branch. - Read the branch file to resolve the Parent Commit.
- Execute
write-tree. - Execute
commit-treelinking the Parent. - Overwrite the branch file with the new Commit Hash.
go run . commit -m "Automated commit message"
The .git/index file generated by Trunk adheres to the version 2 format:
- Header: 12 bytes (
DIRCsignature, version number, entry count). - Entries: 62 bytes of fixed metadata (ctime, mtime, device, inode, mode, uid, gid, size, hash, flags) followed by the variable-length file path and 1-8 bytes of null padding.
Objects are stored in a sharded directory structure to prevent filesystem performance degradation.
- Directory: The first 2 characters of the hex hash.
- Filename: The remaining 38 characters of the hex hash.
- Content:
zlib_compress(type + space + size + null_byte + content)