Skip to content

Add smoke test suite as fast-fail CI gate#345

Merged
shaypal5 merged 2 commits intomasterfrom
copilot/add-smoke-test-suite
Feb 28, 2026
Merged

Add smoke test suite as fast-fail CI gate#345
shaypal5 merged 2 commits intomasterfrom
copilot/add-smoke-test-suite

Conversation

Copy link
Contributor

Copilot AI commented Feb 21, 2026

Long CI matrix runs (60+ jobs across OS/Python/backend combinations) were spinning up regardless of trivial failures. This adds fast-fail gates — a smoke test job and a pre-commit lint job — that the full matrix must pass before starting.

Changes

  • tests/test_smoke.py — 11 new @pytest.mark.smoke tests covering core functionality: imports, pickle/memory caching, clear_cache, stale_after, allow_none, set_global_params, cache_dpath. No external service dependencies; runs in ~0.13s.
  • pyproject.toml — Registers the smoke pytest marker.
  • .github/workflows/ci-test.yml — Adds two new jobs that gate the full matrix:
    • smoke-test: single Python 3.12/Ubuntu run of pytest -m smoke tests/test_smoke.py
    • pre-commit: runs pre-commit run --all-files to enforce code quality before expensive jobs start
    • pytester now declares needs: [smoke-test, pre-commit]
    • testing-guardian updated to depend on all three jobs
    • Added permissions: contents: read to all new/modified jobs
Original prompt

This section details on the original issue you should resolve

<issue_title>Add a smoke test suite</issue_title>
<issue_description>Add a smoke test suite that the full test suit workflow will depend on, to reduce long CI runs that can actually fail fast.

The full test suit should also depend on the pre-commit.ci flow.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add smoke test suite for CI workflow Add smoke test suite as fast-fail CI gate Feb 21, 2026
Copilot AI requested a review from shaypal5 February 21, 2026 22:20
@shaypal5
Copy link
Member

@copilot analyze, explain the cause and suggest a plan to fix the failing test test_pickle_core which fails for Python 3.10 on Windows - writing your insights and recommendations here, as a comment on this PR - but do not implement it yet.

Copy link
Contributor Author

Copilot AI commented Feb 28, 2026

@copilot analyze, explain the cause and suggest a plan to fix the failing test test_pickle_core which fails for Python 3.10 on Windows - writing your insights and recommendations h...

Analysis of test_bad_cache_file[True] failure on Python 3.10 Windows

This is a pre-existing issue unrelated to the smoke-test changes (none of the files I modified touch tests/pickle_tests/test_pickle_core.py or src/cachier/cores/pickle.py).

Root cause

Where it fails: src/cachier/cores/pickle.py, line 148 — _clear_all_cache_files():

def _clear_all_cache_files(self) -> None:
    path, name = os.path.split(self.cache_fpath)
    for subpath in os.listdir(path):
        if subpath.startswith(f"{name}_"):
            os.remove(os.path.join(path, subpath))  # <-- PermissionError here on Windows

Error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process

Why it happens on Windows only: Windows implements mandatory file locking (unlike Linux/macOS which use advisory locking). portalocker uses LockFileEx on Windows, which means while any thread holds a lock on a file, os.remove() on that file raises PermissionError.

The sequence that triggers it

  1. test_bad_cache_file[True] loops through sleep_times * 2, calling _helper_bad_cache_file() each iteration.
  2. Each call to _helper_bad_cache_file() starts two daemon threads and joins them with timeout=2.
  3. _bad_cache (the underlying function) sleeps for 1 second and internally holds a portalocker exclusive lock on the separate-files cache file while computing.
  4. If thread1.join(timeout=2) returns due to timeout before thread1 finishes, thread1 is still running and still holds the portalocker lock.
  5. The next iteration immediately calls _bad_cache_decorated.clear_cache()_clear_all_cache_files()os.remove(), which fails because thread1 from the previous iteration is still holding the lock.

Suggested fix plan

The fix belongs in _clear_all_cache_files() in src/cachier/cores/pickle.py. The os.remove() call should be made resilient to transient Windows file-lock errors by adding a small retry loop:

def _clear_all_cache_files(self) -> None:
    path, name = os.path.split(self.cache_fpath)
    for subpath in os.listdir(path):
        if subpath.startswith(f"{name}_"):
            fpath = os.path.join(path, subpath)
            for attempt in range(3):
                try:
                    os.remove(fpath)
                    break
                except PermissionError:
                    if attempt < 2:
                        time.sleep(0.1 * (attempt + 1))

This is the right layer to fix because clear_cache() is a public API that users also call, so it should be resilient to this Windows behaviour in production too — not just in tests.

Copilot AI requested a review from shaypal5 February 28, 2026 19:03
@shaypal5 shaypal5 merged commit a599cf1 into master Feb 28, 2026
62 of 64 checks passed
@shaypal5 shaypal5 deleted the copilot/add-smoke-test-suite branch February 28, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a smoke test suite

3 participants