Mutation testing tells you something coverage numbers can't: whether your tests would actually catch a bug. It works by introducing small deliberate changes into your code — flipping a + to a -, removing a condition — and checking whether your tests fail. If they don't, the mutation survived, and that's a gap worth knowing about.
mutation is built around three ideas:
Fast. Mutations run in parallel. Most tools write mutated code to disk and run one test at a time — mutation doesn't, so you get results in minutes rather than hours.
Interactive. mutation replay is a guided workflow, not a report. It walks you through each surviving mutation one by one: you inspect it, fix your tests, verify they're green, commit, and move on to the next. Less like a dashboard, more like an interactive rebase.
Light. A single Python file. No Rust compiler, no configuration ceremony. Results stored in a local .mutation.db SQLite file. Source code you can actually read and understand — which matters when you're trusting a tool to tell you the truth about your tests.
mutation runs your tests with pytest. The -- PYTEST-COMMAND option lets you pass any pytest arguments — specific paths, flags, plugins — giving you full control over how the test suite runs.
pip install mutation
mutation play tests.py --include=foobar/ex.py --include=foobar/__init__.py --exclude=tests.py
To install the latest development version directly from the dev branch:
uv tool install git+https://github.com/amirouche/mutation.py@dev
Then work through the results:
mutation replay
mutation play [--verbose] [--include=<glob>]... [--exclude=<glob>]...
[--sampling=<s>] [--randomly-seed=<n>] [--max-workers=<n>]
[--only-deadcode-detection] [--without-exception-injection]
[<file-or-directory> ...] [-- PYTEST-COMMAND ...]
mutation replay [--verbose] [--max-workers=<n>]
mutation list
mutation show MUTATION
mutation apply MUTATION
mutation summary
mutation gc
mutation (-h | --help)
mutation --version
mutation only mutates code with test coverage, so it works best when coverage is high.
mutation detects whether tests can run in parallel — making your test suite parallel-safe will significantly speed things up.
--include=<glob> and --exclude=<glob>
Glob patterns matched against relative file paths. Repeat the flag to supply multiple patterns.
# Mutate only specific modules, exclude both test files and migrations
mutation play tests.py --include=src/*.py --include=lib/*.py --exclude=tests.py --exclude=migrations/*.py
Default --include is *.py (all Python files). Default --exclude is *test* (any path whose relative path contains "test"). The patterns are applied before the coverage filter, so files with no coverage are always skipped regardless.
--sampling=<s>
Limit how many mutations are actually tested — useful for a quick sanity check before a full run.
--sampling=100— test only the first 100 mutations (deterministic order)--sampling=10%— test a random 10% of all mutations (probability-based; set--randomly-seedfor reproducibility)
Default: all mutations are tested.
--randomly-seed=<n>
Integer seed that controls three things at once: the order pytest-randomly uses to shuffle your tests, the random values injected by numeric mutations (MutateNumber), and which mutations are selected when using --sampling=N%. Setting a fixed seed makes any of these behaviors reproducible across runs.
Default: current Unix timestamp (a different seed each run).
mutation play tests.py --randomly-seed=12345 --sampling=20%
-- PYTEST-COMMAND
A full pytest invocation to run instead of the built-in default (pytest --exitfirst --no-header --tb=no --quiet --assert=plain). Useful when you need specific pytest flags, plugins, or a subset of tests.
mutation always appends --mutation=<uid> to whatever command you supply — this flag is how it injects each mutation in-process without touching files on disk. Because of this, the command must be a pytest invocation; other test runners are not supported. Coverage flags (--cov, etc.) are added automatically during the baseline run.
-- PYTEST-COMMAND and <file-or-directory> are mutually exclusive.
# Run only the unit tests, with verbose output
mutation play --include="src/*.py" -- pytest -x -v tests/unit/
AugAssignToAssign — convert augmented assignment to plain assignment
Convert an augmented assignment (x += v) to a plain assignment (x = v), dropping the accumulation, verifying that the update operator is tested.
# before
total += amount
# after
total = amountBreakToReturn — replace break with return
Replace break with return, exiting the enclosing function instead of just the loop, verifying that the loop's exit path is tested.
# before
for item in items:
if item.done:
break
# after
for item in items:
if item.done:
returnComparison — negate comparison expressions
Negate a comparison expression by wrapping it with not (...), verifying that the direction of every comparison is tested.
# before
if x > 0:
process(x)
# after
if not (x > 0):
process(x)DefinitionDrop — remove function or class definitions
Remove a function or class definition entirely (only when others remain in the same body), surfacing unreferenced definitions.
# before
def helper():
return 42
def main():
return helper()
# after
def main():
return helper()ForceConditional — force conditions to True or False
Force the test of an if/while/assert/ternary to always be True or always False, verifying that both branches are meaningfully exercised.
# before
if is_valid(x):
save(x)
# after
if True:
save(x)InjectException — replace expressions with the exception they raise
Replace expressions that have well-known failure modes with a raise of the exception they can produce. This targets error-handling paths that pass on the happy path but silently break when the environment misbehaves.
The contracts are intentionally narrow — stdlib only, no inference:
| Expression | Injected mutation |
|---|---|
d[key] (string key) |
raise KeyError(key) |
lst[i] (integer index) |
raise IndexError(i) |
d[k] (ambiguous) |
both raise KeyError(k) and raise IndexError(k) |
int(x), float(x) |
raise ValueError(x) |
open(path) |
raise FileNotFoundError(path) |
next(it) |
raise StopIteration |
x / y, x // y, x % y |
raise ZeroDivisionError |
obj.attr |
raise AttributeError('attr') |
for x in iterable |
raise StopIteration |
Mutations are skipped when the expression is already inside a try/except that handles the relevant exception, and never injected inside except blocks.
# before
value = data[key]
# after
raise KeyError(key)Use --without-exception-injection to skip all InjectException mutations when error-handling paths are intentionally untested or produce too much noise.
MutateAssignment — replace assignment values with None
Replace the right-hand side of a plain assignment with None, verifying that the assigned value is not silently ignored.
# before
result = compute()
# after
result = NoneMutateCallArgs — replace or drop function arguments
Replace each positional call argument with None, and drop one argument at a time from multi-argument calls, verifying that every argument is actually used.
# before
result = process(data, config)
# after
result = process(None, config)MutateContainment — swap in and not in operators
Swap in ↔ not in in membership tests, verifying that the expected membership relationship is directly tested.
# before
if key in cache:
return cache[key]
# after
if key not in cache:
return cache[key]MutateContextManager — strip context managers from with blocks
Strip context managers from a with statement one at a time, keeping the body, verifying that each manager's effect is tested.
# before
with lock:
update_shared_state()
# after
update_shared_state()MutateDefaultArgument — remove default argument values
Remove leading default argument values one at a time, making parameters required, verifying that callers always supply them explicitly.
# before
def connect(host, port=8080, timeout=30):
...
# after
def connect(host, port, timeout=30):
...MutateExceptionHandler — replace exception types with Exception
Replace the specific exception type in an except clause with the generic Exception, verifying that the handler is tested for the right error kind.
# before
try:
connect()
except ConnectionError:
retry()
# after
try:
connect()
except Exception:
retry()MutateFString — replace f-string interpolations with empty strings
Replace each interpolated expression in an f-string with an empty string, verifying that callers check the formatted content rather than just the surrounding template.
# before
msg = f"expected {actual}, got {result}"
# after
msg = f"expected , got {result}"MutateGlobal — remove global and nonlocal declarations
Remove a global or nonlocal declaration entirely, causing assignments to bind a local variable instead, verifying that the scoping is exercised by tests.
# before
def increment():
global counter
counter += 1
# after
def increment():
counter += 1MutateIdentity — swap is and is not operators
Swap is ↔ is not in identity comparisons, verifying that the expected identity relationship is directly tested.
# before
if obj is None:
init()
# after
if obj is not None:
init()MutateIterator — wrap iterables in reversed()
Wrap a for-loop's iterable in reversed(), verifying that iteration order assumptions are tested.
# before
for item in queue:
process(item)
# after
for item in reversed(queue):
process(item)MutateKeyword — rotate flow and boolean keywords
Rotate flow keywords (break/continue/pass), swap boolean constants (True/False/None), and flip boolean operators (and/or).
# before
while True:
if done:
break
# after
while True:
if done:
continueMutateLambda — replace lambda bodies with None
Replace the body of a lambda with None (or 0 when the body is already None), verifying that the lambda's computation is actually used.
# before
transform = lambda x: x * 2
# after
transform = lambda x: NoneMutateMatchCase — remove match case branches
Remove one case branch at a time from a match statement (Python 3.10+ only), verifying that each branch is exercised by the test suite.
# before
match command:
case "quit":
quit()
case "go":
go()
# after
match command:
case "go":
go()MutateNumber — replace numeric literals with random values
Replace an integer or float literal with a random value in the same bit-range, verifying that the exact numeric value is tested.
# before
TIMEOUT = 30
# after
TIMEOUT = 17MutateOperator — replace arithmetic and comparison operators
Replace an arithmetic, bitwise, shift, or comparison operator with another in the same group, verifying the exact operator matters.
# before
result = a + b
# after
result = a - bMutateReturn — replace return values with defaults
Replace a return value with a type-appropriate default (None, 0, False, or ""), verifying that callers check what the function returns.
# before
def get_count():
return len(items)
# after
def get_count():
return 0MutateSlice — drop slice bounds and negate steps
Drop the lower or upper bound of a slice (a[i:j] → a[:j] or a[i:]) and negate the step (a[::2] → a[::-2]), verifying that slice boundary conditions and direction are tested.
# before
chunk = data[start:end]
# after
chunk = data[:end]MutateString — prepend prefixes to string literals
Prepend a fixed prefix to a string or bytes literal, verifying that callers check the actual content.
# before
label = "hello"
# after
label = "mutated string hello"MutateStringMethod — swap symmetric string methods
Swap directionally symmetric string methods (lower↔upper, lstrip↔rstrip, find↔rfind, ljust↔rjust, removeprefix↔removesuffix, partition↔rpartition, split↔rsplit), verifying that the direction matters.
# before
name = text.lower()
# after
name = text.upper()MutateYield — replace yield values with None
Replace the value of a yield expression with None, verifying that the yielded value is actually used by callers.
# before
def generate():
yield compute()
# after
def generate():
yield NoneNegateCondition — wrap conditions with not
Wrap a bare (non-comparison) condition with not, inserting the logical inverse of the test, verifying that the truthiness of the value actually matters.
# before
if user.is_active:
allow()
# after
if not user.is_active:
allow()RemoveDecorator — remove decorators from functions and classes
Remove one decorator at a time from a decorated function or class, verifying that each decorator's effect is covered by tests.
# before
@login_required
def dashboard(request):
return render(request)
# after
def dashboard(request):
return render(request)RemoveUnaryOp — strip unary operators
Strip a unary operator (not, -, ~) and leave only the operand, verifying that the operator's effect is covered by tests.
# before
if not flag:
skip()
# after
if flag:
skip()StatementDrop — replace statements with pass
Replace a covered statement with pass, verifying that no statement is inert dead code.
# before
x = compute()
validate(x)
# after
x = compute()
passSwapArguments — swap function call arguments
Swap each pair of positional call arguments, verifying that argument order is tested.
# before
result = process(source, dest)
# after
result = process(dest, source)ZeroIteration — replace iterables with empty lists
Replace a for-loop's iterable with an empty list, forcing the body to never execute, verifying that callers handle empty-collection cases.
# before
for item in items:
process(item)
# after
for item in []:
process(item)Early stage. Things may break. Bug reports and questions welcome at amirouche.boubekki@gmail.com.