Add smoke test suite as fast-fail CI gate by Copilot · Pull Request #345 · python-cachier/cachier

Copilot · 2026-02-21T22:14:03Z

Long CI matrix runs (60+ jobs across OS/Python/backend combinations) were spinning up regardless of trivial failures. This adds fast-fail gates — a smoke test job and a pre-commit lint job — that the full matrix must pass before starting.

Changes

tests/test_smoke.py — 11 new @pytest.mark.smoke tests covering core functionality: imports, pickle/memory caching, clear_cache, stale_after, allow_none, set_global_params, cache_dpath. No external service dependencies; runs in ~0.13s.
pyproject.toml — Registers the smoke pytest marker.
.github/workflows/ci-test.yml — Adds two new jobs that gate the full matrix:
- smoke-test: single Python 3.12/Ubuntu run of pytest -m smoke tests/test_smoke.py
- pre-commit: runs pre-commit run --all-files to enforce code quality before expensive jobs start
- pytester now declares needs: [smoke-test, pre-commit]
- testing-guardian updated to depend on all three jobs
- Added permissions: contents: read to all new/modified jobs

Original prompt

This section details on the original issue you should resolve

<issue_title>Add a smoke test suite</issue_title>
<issue_description>Add a smoke test suite that the full test suit workflow will depend on, to reduce long CI runs that can actually fail fast.

The full test suit should also depend on the pre-commit.ci flow.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Add a smoke test suite #344

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

shaypal5 · 2026-02-28T19:00:40Z

@copilot analyze, explain the cause and suggest a plan to fix the failing test test_pickle_core which fails for Python 3.10 on Windows - writing your insights and recommendations here, as a comment on this PR - but do not implement it yet.

Copilot · 2026-02-28T19:03:23Z

@copilot analyze, explain the cause and suggest a plan to fix the failing test test_pickle_core which fails for Python 3.10 on Windows - writing your insights and recommendations h...

Analysis of `test_bad_cache_file[True]` failure on Python 3.10 Windows

This is a pre-existing issue unrelated to the smoke-test changes (none of the files I modified touch tests/pickle_tests/test_pickle_core.py or src/cachier/cores/pickle.py).

Root cause

Where it fails: src/cachier/cores/pickle.py, line 148 — _clear_all_cache_files():

def _clear_all_cache_files(self) -> None:
    path, name = os.path.split(self.cache_fpath)
    for subpath in os.listdir(path):
        if subpath.startswith(f"{name}_"):
            os.remove(os.path.join(path, subpath))  # <-- PermissionError here on Windows

Error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process

Why it happens on Windows only: Windows implements mandatory file locking (unlike Linux/macOS which use advisory locking). portalocker uses LockFileEx on Windows, which means while any thread holds a lock on a file, os.remove() on that file raises PermissionError.

The sequence that triggers it

test_bad_cache_file[True] loops through sleep_times * 2, calling _helper_bad_cache_file() each iteration.
Each call to _helper_bad_cache_file() starts two daemon threads and joins them with timeout=2.
_bad_cache (the underlying function) sleeps for 1 second and internally holds a portalocker exclusive lock on the separate-files cache file while computing.
If thread1.join(timeout=2) returns due to timeout before thread1 finishes, thread1 is still running and still holds the portalocker lock.
The next iteration immediately calls _bad_cache_decorated.clear_cache() → _clear_all_cache_files() → os.remove(), which fails because thread1 from the previous iteration is still holding the lock.

Suggested fix plan

The fix belongs in _clear_all_cache_files() in src/cachier/cores/pickle.py. The os.remove() call should be made resilient to transient Windows file-lock errors by adding a small retry loop:

def _clear_all_cache_files(self) -> None:
    path, name = os.path.split(self.cache_fpath)
    for subpath in os.listdir(path):
        if subpath.startswith(f"{name}_"):
            fpath = os.path.join(path, subpath)
            for attempt in range(3):
                try:
                    os.remove(fpath)
                    break
                except PermissionError:
                    if attempt < 2:
                        time.sleep(0.1 * (attempt + 1))

This is the right layer to fix because clear_cache() is a public API that users also call, so it should be resilient to this Windows behaviour in production too — not just in tests.

Copilot AI assigned Copilot and shaypal5 Feb 21, 2026

Copilot started work on behalf of shaypal5 February 21, 2026 22:14 View session

Copilot AI changed the title ~~[WIP] Add smoke test suite for CI workflow~~ Add smoke test suite as fast-fail CI gate Feb 21, 2026

Copilot AI requested a review from shaypal5 February 21, 2026 22:20

Copilot finished work on behalf of shaypal5 February 21, 2026 22:20