Skip to content

Releases: ctrlbreak/filematcher

v1.5.2

02 Feb 17:35

Choose a tag to compare

What's Changed

Fixed

  • Auto-confirm display bug — when pressing 'a' to process all remaining groups, output was corrupted due to incorrect cursor positioning

Changed

  • Simplified README — reduced from 571 to 214 lines, extracted JSON schema to separate file
  • Improved documentation — clearer examples using master_dir/other_dir naming, expanded jq examples

Added

  • Animated demo GIF — shows interactive hardlink execution with colored output
  • Demo regeneration scriptcreate_demo.sh for updating the GIF
  • JSON schema referenceJSON_SCHEMA.md with full schema documentation

v1.5.1 - Code Quality Improvements

02 Feb 09:30

Choose a tag to compare

What's Changed

This patch release documents internal code quality improvements made since v1.5.0.

Code Quality Improvements

  • Reduced public API surface from 89 to 18 exports in __init__.py - cleaner, more focused public interface
  • Fixed exit code inconsistency - partial failures now correctly return exit code 2 (was inconsistent)
  • Reduced main() complexity from 425 to 145 lines through extraction of 6 helper functions:
    • _validate_args() - argument validation
    • _setup_logging() - audit log initialization
    • _build_master_results() - master directory result assembly
    • _dispatch_compare_mode() - compare mode handling
    • _dispatch_execute_mode() - execute mode handling
    • _dispatch_preview_mode() - preview mode handling

Internal Changes

These changes improve maintainability without affecting user-facing behavior:

  • Cleaner separation of concerns in CLI module
  • Better testability through smaller functions
  • Consistent exit codes across all code paths

Full Changelog: https://github.com/PatrickKoss/filematcher/compare/v1.5.0...v1.5.1

v1.5.0 - Interactive Execute

31 Jan 02:17

Choose a tag to compare

What's New

Interactive Execute Mode - Execute mode now prompts for confirmation on each file, giving you full control over which duplicates are processed.

Features

  • Per-file confirmation (y/n/a/q) - Review each duplicate before action
    • y - Yes, process this file
    • n - No, skip this file
    • a - All, process remaining without prompts
    • q - Quit, stop and show summary
  • JSON schema v2.0 - Restructured output with unified header object
    • Version, timestamp, mode, action, hash algorithm in header
    • Consistent master/duplicate directory naming
  • Fail-fast validation - Invalid flag combinations caught before file scanning
  • Enhanced summaries - Shows confirmed/skipped/failed counts with space savings

Breaking Changes

  • JSON output schema changed to v2.0 (header is now an object)
  • --json --execute now requires --yes flag
  • --quiet --execute now requires --yes flag

Exit Codes

  • 0 - Success
  • 1 - Error
  • 2 - Partial failure (some operations failed)
  • 130 - User quit (Ctrl+C or q)

Full Changelog

https://github.com/pchaganti/filematcher/compare/v1.4.0...v1.5.0

v1.4.0

28 Jan 00:21

Choose a tag to compare

File Matcher v1.4.0 Release Notes

Release Date: 2026-01-28

Overview

File Matcher v1.4.0 refactors the codebase from a single-file implementation to a proper Python package structure. This improves code navigation, IDE support, and maintainability while maintaining full backward compatibility.

What's New

Package Structure

The codebase has been reorganized into a filematcher/ package with focused modules:

filematcher/
├── __init__.py      # Package exports
├── __main__.py      # python -m filematcher support
├── cli.py           # Command-line interface and main()
├── colors.py        # TTY-aware color output
├── hashing.py       # MD5/SHA-256 content hashing
├── filesystem.py    # Filesystem helpers (hardlink detection, etc.)
├── actions.py       # Action execution and audit logging
├── formatters.py    # Text and JSON output formatters
└── directory.py     # Directory indexing and matching

Installation Options

# Install via pip (recommended)
pip install .
filematcher dir1 dir2

# Or run directly (still works)
python file_matcher.py dir1 dir2

Full Backward Compatibility

All existing usage patterns continue to work:

# CLI usage unchanged
python file_matcher.py dir1 dir2
filematcher dir1 dir2

# Imports still work
from file_matcher import get_file_hash, find_matching_files
from filematcher import get_file_hash, find_matching_files  # New preferred style

Technical Details

  • 7 modules in filematcher/ package
  • file_matcher.py is now a thin wrapper (re-exports from package)
  • 218 unit tests (217 original + 1 circular import verification test)
  • No circular imports (verified via subprocess test)
  • Pure Python standard library (no external dependencies)
  • Python 3.9+ required

Module Organization

Module Purpose
cli.py Argument parsing, main() entry point, logging setup
colors.py ColorConfig, ColorMode, ANSI helpers (green, red, etc.)
hashing.py get_file_hash(), get_sparse_hash() for content hashing
filesystem.py is_hardlink_to(), is_symlink_to(), check_cross_filesystem()
actions.py execute_action(), safe_replace_with_link(), audit logging
formatters.py ActionFormatter ABC, TextActionFormatter, JsonActionFormatter
directory.py find_matching_files(), index_directory()

Upgrading from v1.3.0

v1.4.0 is fully backward compatible. No changes required to existing scripts or workflows.

Optional migration: Update imports from file_matcher to filematcher:

# Old (still works)
from file_matcher import get_file_hash, find_matching_files

# New (preferred)
from filematcher import get_file_hash, find_matching_files

Why Package Structure?

  1. Better IDE Support: Jump-to-definition works across modules
  2. Easier Navigation: Find code by module name instead of scrolling 2000+ lines
  3. AI Tooling: LLMs can read focused modules instead of entire codebase
  4. Maintainability: Changes to one module don't affect others
  5. Testing: Easier to test individual modules in isolation

Links

v1.1.0 Deduplication

20 Jan 01:07

Choose a tag to compare

File Matcher v1.1.0 Release Notes

Release Date: 2026-01-20

Overview

File Matcher v1.1.0 adds full file deduplication capability. You can now replace duplicate files with hard links, symbolic links, or delete them entirely—all with preview-by-default safety and comprehensive audit logging.

New Features

Master Directory Protection

Designate one directory as "master" using --master. Files in the master directory are never modified or deleted.

filematcher dir1 dir2 --master dir1 --action hardlink

Deduplication Actions

Three ways to handle duplicates:

  • hardlink — Replace duplicate with hard link to master (same inode, saves space)
  • symlink — Replace duplicate with symbolic link to master
  • delete — Remove duplicate file entirely (irreversible)

Preview-by-Default Safety

Running with --action shows a preview of what would happen. No files are modified until you add --execute.

# Preview only (safe)
filematcher dir1 dir2 --master dir1 --action hardlink

# Actually execute
filematcher dir1 dir2 --master dir1 --action hardlink --execute

Confirmation Prompt

Before execution, you'll see a confirmation prompt. Use --yes to skip for scripted use.

Audit Logging

Every modification is logged with timestamps. Use --log for a custom log path.

filematcher dir1 dir2 --master dir1 --action hardlink --execute --log changes.log

Cross-Filesystem Support

Hard links can't span filesystems. Use --fallback-symlink to automatically use symbolic links when hard links fail.

New Command-Line Options

Option Description
--master, -m Master directory (files never modified)
--action, -a Action: hardlink, symlink, or delete
--execute Execute changes (default: preview only)
--yes, -y Skip confirmation prompt
--log, -l Custom audit log path
--fallback-symlink Use symlink if hardlink fails

Exit Codes

Code Meaning
0 Success (or user aborted)
1 All operations failed
2 Invalid arguments
3 Partial success (some failed)

Example Workflow

# 1. Find duplicates
filematcher photos_backup photos_main

# 2. Preview deduplication
filematcher photos_backup photos_main --master photos_main --action hardlink

# 3. Execute with logging
filematcher photos_backup photos_main --master photos_main --action hardlink --execute --log dedup.log

Technical Details

  • Atomic operations using temp-rename pattern (no data loss on failure)
  • 1,374 lines of Python
  • 114 unit tests, all passing
  • Pure Python standard library (no external dependencies)
  • Python 3.9+ required

Upgrading from v1.0.0

v1.1.0 is fully backward compatible. All v1.0.0 commands work unchanged. The new deduplication features are opt-in via --master and --action flags.

Links

v1.0.0

20 Aug 18:17

Choose a tag to compare

File Matcher v1.0.0 Release Notes

🎉 First Stable Release!

This is the first stable release of File Matcher, a powerful tool for finding duplicate files across different directory structures.

✨ What's New

Core Features

  • File Matching: Find files with identical content using content hashing
  • Multiple Hash Algorithms: Support for both MD5 (fast) and SHA-256 (secure)
  • Directory Traversal: Recursively scan nested directory structures
  • Cross-Platform: Works on Windows, macOS, and Linux

Advanced Features

  • Fast Mode: Efficient comparison of large files (>100MB) using sparse sampling
  • Summary Mode: Quick overview showing match counts and unmatched files
  • Detailed Mode: Full listing of all file paths and matches
  • Large File Support: Efficient chunked reading (4KB chunks) for memory management

Developer Experience

  • Comprehensive Test Suite: 16 unit tests covering all functionality
  • No Dependencies: Uses only Python standard library
  • Well Documented: Complete README with examples and usage instructions

🚀 Getting Started

Quick Install

# Download and extract the release package
# No installation required - just Python 3.6+

# Run tests to verify
python3 run_tests.py

# Basic usage
python3 file_matcher.py <directory1> <directory2>

Examples

# Compare two directories
python3 file_matcher.py test_dir1 test_dir2

# Use fast mode for large files
python3 file_matcher.py test_dir1 test_dir2 --fast

# Get summary only
python3 file_matcher.py test_dir1 test_dir2 --summary

# Use SHA-256 hashing
python3 file_matcher.py test_dir1 test_dir2 --hash sha256

🔧 Technical Details

  • Python Version: 3.6 or higher
  • Dependencies: None (standard library only)
  • File Size Limit: No practical limit (handles multi-GB files efficiently)
  • Memory Usage: Optimized with chunked reading and sparse sampling

📁 What's Included

  • file_matcher.py - Main script
  • README.md - Complete documentation
  • CHANGELOG.md - Version history
  • tests/ - Comprehensive test suite
  • test_dir1/, test_dir2/ - Example test directories
  • complex_test/ - Advanced test scenarios

🐛 Bug Reports & Feedback

If you find any issues or have suggestions, please open an issue on GitHub.

📄 License

This project is open source and available under the MIT License.


Download: Choose the appropriate archive for your platform:

  • filematcher-1.0.0.zip - Windows/macOS users
  • filematcher-1.0.0.tar.gz - Linux/Unix users