Deterministic generator for creating diverse file-system artifacts to test forensic tools (Autopsy, EnCase, FTK, etc.). Use YAML manifests for simple operations, playbooks for complex modus operandi simulation, or the bulk generator for a quick synthetic corpus with no YAML required.
- Deterministic output with
--seed - Manifest mode: simple, declarative file operations (create/update/append/delete/mace/rename/truncate/rotate/ads/motw)
- Playbook mode: complex timelines with actors, timed steps, and templating
- Bulk mode: super-simple synthetic corpus generation with
--bulkand--depth - Extensive file type support: documents, logs, archives, media, emails, Windows artifacts
- MACE (atime/mtime) timestamp control
- Windows-specific: NTFS ADS and Mark-of-the-Web (MoTW)
Build from source:
go build -v -o fsagen.exefsagen can generate artifacts using a manifest, a playbook, or the bulk generator:
fsagen [OPTIONS] <output-path>Options:
--seed N- PRNG seed for deterministic generation (default: 1)--manifest FILE- Execute a YAML manifest (simple file operations)--playbook FILE- Execute a YAML playbook (complex modus operandi)--bulk N- Super-simple bulk generation: N items per level (no YAML)--depth D- Bulk generation depth (default: 1)--timeline FILE- Generate forensic timeline after execution (formats: csv, txt, bodyfile, macb)
Examples:
Simple bulk generation with manifest:
fsagen --seed 42 --manifest examples/manifest-bulk-simple.yaml ./outputComplex adversary simulation with playbook:
fsagen --seed 100 --playbook examples/playbook-adversary-data-theft.yaml ./crime-sceneGenerate artifacts with forensic timeline:
fsagen --seed 42 --playbook examples/playbook-comprehensive-ransomware.yaml --timeline timeline.csv ./outputQuick synthetic corpus with bulk generator (no YAML):
fsagen --seed 7 --bulk 3 --depth 2 ./quick-bulkYAML with a sequence of operations:
- action:
create|update|append|truncate|rotate|delete|mace|rename|ads|motw - path: target path relative to output root
- type:
file|dir(for create) - ext: file extension to append if
pathhas no extension - content: literal content (optional)
- content_len: size of deterministic random content (fallback when content not provided)
- atime/mtime: RFC3339 timestamps for MACE control
- new_path: new location for
renameorrotate - stream: ADS stream name (for
adsaction, Windows-only) - zone_id, host_url, referrer_url: for
motwaction (Windows-only)
Examples:
examples/manifest-basic.yaml- Basic create/update/delete operationsexamples/manifest-bulk-simple.yaml- Quick bulk file generation across multiple types
YAML with a timeline and actors:
- start: RFC3339 or "now"
- variables: Global variables for templating (map of key-value pairs)
- actors: List of { name, base, variables }
- name: Actor identifier
- base: Base directory for this actor's files
- variables: Actor-specific variables (override global variables)
- steps: Timeline steps
- actor: Actor name
- offset: time.Duration from start for first occurrence (e.g., 5m, 2h)
- every: Repeat interval (optional)
- repeat: Number of occurrences (default 1)
- condition: Step-level conditional execution ("odd", "even", "first", "last")
- batch_count: Generate N files in this step (multiplies actions)
- actions: List of operations with extras:
- offset: time.Duration relative to the step occurrence
- condition: Action-level conditional execution
- template: Predefined content template ("email", "log", "script", "doc")
- All standard manifest fields (action, path, content, etc.)
Operations supported: create|update|append|truncate|rotate|delete|mace|rename|ads|motw (all operations work in both manifest and playbook). ads and motw are Windows-only. Timestamps are computed from the timeline unless explicitly provided in the action.
Playbook templating:
${SEQ}- Monotonic sequence counter${RND:N}or${RANDOM:N}- Deterministic random string of length N${DATE:layout}- Current time formatted with Go layout (e.g.,${DATE:2006-01-02T15:04:05Z07:00})${ACTOR}- Current actor name${VAR:name}- Variable substitution (from global or actor-specific variables)${UUID}- Deterministic UUID based on sequence${IP}- Deterministic IP address (192.168.x.x range)${HASH:N}- Deterministic hash-like hex string of length N${BATCH}- Current batch index (when using batch_count)${ITER}- Current iteration index (when using repeat)
Advanced Playbook Features:
- Variables: Define reusable values at global and actor scope
variables:
campaign_id: "OP-2024-001"
target_org: "ACME Corp"
actors:
- name: attacker
base: users/victim/Downloads
variables:
ip_addr: "192.0.2.42"- Conditional Execution: Control when steps/actions run
steps:
- actor: malware
repeat: 10
condition: even # Only runs on even iterations (0, 2, 4, ...)
actions:
- action: create
path: file-${ITER}.txt
condition: odd # Further filtering at action level- Batch Operations: Generate multiple files in one step
steps:
- actor: ransomware
batch_count: 100 # Creates 100 files
actions:
- action: create
path: encrypted-${BATCH}.locked
content_len: 2048- Content Templates: Use predefined realistic content
actions:
- action: create
path: message.eml
template: email # Generates realistic email structureAvailable templates: email, log, script, doc
Example playbooks:
examples/playbook-basic.yaml- Simple two-actor workflowexamples/playbook-adversary-data-theft.yaml- Stages documents, archives, writes exfil logs, backdatesexamples/playbook-log-tampering.yaml- Creates baseline logs, injects tampered entries, backdates, deletesexamples/playbook-persistence-artifacts.yaml- Drops startup-like files and .reg exportsexamples/playbook-email-and-archive.yaml- Creates emails/images, archives, deletes originalsexamples/playbook-log-rotate-and-truncate.yaml- Demonstrates log rotation and truncationexamples/playbook-windows-ads-motw.yaml- Adds NTFS ADS and Mark-of-the-Web (Windows-only)examples/playbook-comprehensive-ransomware.yaml- Advanced: Full ransomware attack with variables, batching, conditions, and templatesexamples/playbook-insider-threat-exfil.yaml- Advanced: 7-day insider threat scenario with repeated access patternsexamples/playbook-malware-lifecycle.yaml- Advanced: 48-hour malware infection lifecycle with beaconing and anti-forensics
- Sets mtime/atime via
os.Chtimes. ctime is not directly settable on most systems and will reflect metadata change time. - To emulate directory timestamp skew on deletion,
deletecan includeatime/mtimewhich will be applied to the parent directory after removal.
- All random values (names, synthetic content) come from a seeded PRNG. Use the same
--seedto reproduce identical output on the same platform and file system. - Race-condition free: concurrent operations use proper synchronization while maintaining determinism.
The generator can create artifacts with proper structure for:
- Documents: .txt, .md, .docx, .pdf
- Data: .csv, .json, .jsonl, .xml, .html
- Logs: .log, .syslog, .jsonl
- Media: .png, .mp4
- Archives: .zip
- Email: .eml, .mbox
- Windows: .reg, .exe, NTFS ADS, MoTW
After generating artifacts, fsagen can automatically create forensic timelines for analysis:
fsagen --playbook scenario.yaml --timeline output.csv ./artifactsTimeline Formats:
- CSV (
.csv): Structured data with all metadata (path, size, mode, timestamps, MD5, type, ADS) - TXT (
.txt): Human-readable format with detailed file information - Bodyfile (
.bodyfile): Compatible with The Sleuth Kit's mactime tool - MACB (
.macb): Modified/Accessed/Changed/Birth timeline showing all timestamp events separately
Timeline Features:
- MD5 hash calculation for all files (except files > 100MB)
- Full timestamp capture (access, modify, change/create times)
- NTFS Alternate Data Stream detection (Windows)
- Chronologically sorted by modification time
- Deterministic output (same seed = same timeline)
- Timeline-only mode: Generate timelines from existing artifacts without regenerating them
Example workflows:
# Generate ransomware scenario with CSV timeline
fsagen --seed 999 --playbook examples/playbook-comprehensive-ransomware.yaml --timeline ransomware.csv ./scene
# Create timeline compatible with mactime
fsagen --playbook examples/playbook-malware-lifecycle.yaml --timeline evidence.bodyfile ./analysis
mactime -b evidence.bodyfile -d > detailed-timeline.txt
# Generate MACB timeline for temporal analysis
fsagen --playbook examples/playbook-insider-threat-exfil.yaml --timeline investigation.macb ./case
# Timeline-only mode: generate timeline from existing artifacts (no regeneration)
fsagen --timeline existing-timeline.csv ./already-generated-folderTimeline-only mode:
If you've already generated artifacts but forgot to create a timeline, you can generate one later without regenerating the artifacts:
# Generate timeline from existing folder
fsagen --timeline my-timeline.csv ./existing-artifacts
# Different formats
fsagen --timeline analysis.txt ./crime-scene
fsagen --timeline evidence.bodyfile ./investigation
fsagen --timeline temporal.macb ./case-folderThis scans the folder, collects all file metadata, calculates MD5 hashes, and outputs the timeline in your chosen format—no artifact regeneration needed.
See examples/TIMELINE_EXAMPLES.md for more timeline generation examples.
Use the bulk generator when you need a fast, synthetic corpus without describing a scenario:
# N items per level, depth D directories deep
fsagen --seed 7 --bulk 3 --depth 2 ./quick-bulk
# You can also emit a timeline for bulk output
fsagen --seed 7 --bulk 3 --depth 2 --timeline timeline.csv ./quick-bulkWhat it does:
- Creates a directory fan-out up to
--depthwith--bulksub-branches per level - Populates each level with many file types (txt, docx, png, pdf, mp4, csv, json, xml, html, log, reg, zip, exe, jsonl, syslog, md, eml, mbox)
- Deterministic names and contents from
--seed
Intended use:
- Quickly produce a sizeable, diverse dataset for tool demos, performance tests, or classroom exercises
- Warm-up data for timeline/report pipelines when a complex MO isn’t needed
Notes:
- Output size grows quickly with
--bulkand--depth. Start small (e.g.,--bulk 2 --depth 1or--bulk 3 --depth 2). - Bulk mode is structure/content focused; if you need precise timelines, actors, or conditions, prefer Playbooks.
https://link.springer.com/chapter/10.1007/978-981-96-9443-3_17
@InProceedings{10.1007/978-981-96-9443-3_17, author="Gogia, Gaurav and Rughani, Parag", editor="Gohil, Bhavesh N. and Patel, Sankita J. and Chaudhary, Naveen Kumar and Iyengar, S. S. and Modi, Chirag and Padhya, Mukti", title="File System Artefacts Generator (FSAGen): Towards Faster Forensic Tool Testing", booktitle="Information Security, Privacy and Digital Forensics", year="2026", publisher="Springer Nature Singapore", address="Singapore", pages="239--248", abstract="Software testing is one of the most fundamental steps in any software development lifecycle. The larger the scale, the more testing is required to ensure the correctness and reliability of the software. In the case of digital forensics, one of the main problems that researchers face is the availability of datasets for testing the reliability of the product they are evaluating. Different forensic tools with similar features may present different results even with similar inputs. This makes it extremely important to have standardised and reproducible datasets. This research explores synthetic dataset generators and introduces a novel command-line interface (CLI) tool for generating file system artefacts. The tool aims to facilitate the quick and convenient creation of synthetic datasets to aid in the validation of file system forensic tools. By offering a simplified and cross-platform solution, this tool addresses the need for standardised datasets in digital forensics research and enhances the reliability and accuracy of forensic tool evaluations.", isbn="978-981-96-9443-3" }