First commit of the library, structure;l purpose etc is defined here by ypriverol · Pull Request #1 · PRIDE-Archive/pyprideap

ypriverol · 2026-03-11T10:29:23Z

This pull request introduces the initial implementation of the pyprideap Python library for handling affinity proteomics datasets from the PRIDE Affinity Archive (PAD). It includes core data structures, file readers for multiple platforms and formats, validation logic, statistics computation, and CI setup. The changes establish the foundation for reading, validating, and analyzing Olink and SomaScan datasets, and provide development tooling for linting, testing, and type checking.

Core library structure and data model:

Introduced core data classes (AffinityDataset, Platform, Level, ValidationResult) in src/pyprideap/core.py to represent datasets and validation results.
Added main package initializer src/pyprideap/__init__.py to expose top-level API including reading, validation, statistics, and PRIDE client functionality.

File readers for supported platforms and formats:

Implemented file readers for Olink and SomaScan platforms: CSV, Parquet, XLSX, ADAT, and SomaScan CSV, each producing an AffinityDataset. [1] [2] [3] [4] [5]
Added registry logic in src/pyprideap/readers/registry.py for automatic format detection and unified dataset reading.

Validation and statistics:

Added platform-specific validators and a unified validate function to check datasets against PRIDE guidelines.
Implemented dataset statistics computation in src/pyprideap/stats.py with summary metrics for samples, features, detection rate, QC, and value distribution.

PRIDE Archive integration and development tooling:

Introduced PrideClient class in src/pyprideap/pride.py for fetching project metadata and files from the PRIDE Archive API.
Added CI workflow (.github/workflows/ci.yml) for linting, testing, and type checking, and defined project metadata and dev dependencies in pyproject.toml. [1] [2]

Documentation:

Overhauled README.md to document installation, usage, supported formats, validation rules, data model, and development instructions.

Summary by CodeRabbit

New Features
- Multi-format readers for Olink and SomaScan; unified AffinityDataset top-level API and version metadata
- Platform-aware validation, dataset statistics, LOD analysis, filtering utilities, PRIDE archive client
- QC toolkit: compute/render functions and a self-contained HTML QC report generator
Documentation
- Expanded README with quick start, examples, supported formats, validation rules, and developer guidance
Chores
- Project packaging, CI workflow, .gitignore updates, comprehensive tests, fixtures, and test helpers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix ruff lint errors (unused imports, import ordering) - Apply ruff formatting across all source and test files - Fix mypy type errors in validators, stats, and pride modules - Add types-requests to dev dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests the full read -> validate -> stats pipeline for all four supported formats (Olink CSV, Parquet, SomaScan ADAT, SomaScan CSV). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

qodo-code-review · 2026-03-11T10:29:41Z

Review Summary by Qodo

Initial release of pyprideap library for PRIDE Affinity Proteomics data

✨ Enhancement

Walkthroughs

Description

• Initial library release with complete project scaffolding
• Multi-format readers for Olink (CSV, Parquet, XLSX) and SomaScan (ADAT, CSV)
• Comprehensive validation framework with platform-specific validators
• Statistics computation and PRIDE API client integration
• Full test suite with CI/CD pipeline and documentation

Diagram

flowchart LR
  A["Input Files<br/>CSV/Parquet/XLSX/ADAT"] -->|Auto-detect| B["Registry<br/>Format Detection"]
  B -->|Route| C["Platform Readers<br/>Olink/SomaScan"]
  C -->|Parse| D["AffinityDataset<br/>Core Data Model"]
  D -->|Validate| E["Validators<br/>Schema/QC/Range"]
  D -->|Analyze| F["Statistics<br/>Detection Rate/QC"]
  D -->|Fetch| G["PRIDE Client<br/>Project Metadata"]
  E -->|Results| H["ValidationResult<br/>Error/Warning/Info"]
  F -->|Output| I["DatasetStats<br/>Summary"]

File Changes

1. pyproject.toml ⚙️ Configuration changes +43/-0

Project configuration with dependencies and tools

pyproject.toml

2. README.md 📝 Documentation +113/-2

Comprehensive documentation with usage examples

README.md

3. .github/workflows/ci.yml ⚙️ Configuration changes +42/-0

GitHub Actions CI pipeline for lint, test, typecheck

.github/workflows/ci.yml

View more (29)

4. src/pyprideap/__init__.py ✨ Enhancement +24/-0

Public API exports and version management

src/pyprideap/init.py

5. src/pyprideap/core.py ✨ Enhancement +36/-0

Core data structures and enums for library

src/pyprideap/core.py

6. src/pyprideap/pride.py ✨ Enhancement +41/-0

PRIDE Archive API client for project metadata

src/pyprideap/pride.py

7. src/pyprideap/readers/__init__.py ✨ Enhancement +13/-0

Reader module exports and public interface

src/pyprideap/readers/init.py

8. src/pyprideap/readers/olink_csv.py ✨ Enhancement +42/-0

Olink NPX CSV long-format data parser

src/pyprideap/readers/olink_csv.py

9. src/pyprideap/readers/olink_parquet.py ✨ Enhancement +40/-0

Olink Explore HT parquet format reader

src/pyprideap/readers/olink_parquet.py

10. src/pyprideap/readers/olink_xlsx.py ✨ Enhancement +35/-0

Olink XLSX spreadsheet format reader

src/pyprideap/readers/olink_xlsx.py

11. src/pyprideap/readers/somascan_adat.py ✨ Enhancement +81/-0

SomaScan ADAT three-section format parser

src/pyprideap/readers/somascan_adat.py

12. src/pyprideap/readers/somascan_csv.py ✨ Enhancement +32/-0

SomaScan wide-format CSV data reader

src/pyprideap/readers/somascan_csv.py

13. src/pyprideap/readers/registry.py ✨ Enhancement +64/-0

Auto-detection registry for format routing

src/pyprideap/readers/registry.py

14. src/pyprideap/stats.py ✨ Enhancement +78/-0

Dataset statistics computation and summary

src/pyprideap/stats.py

15. src/pyprideap/validators/__init__.py ✨ Enhancement +27/-0

Validator dispatch and public interface

src/pyprideap/validators/init.py

16. src/pyprideap/validators/base.py ✨ Enhancement +9/-0

BaseValidator protocol definition

src/pyprideap/validators/base.py

17. src/pyprideap/validators/olink_explore.py ✨ Enhancement +124/-0

Olink Explore schema and QC validation rules

src/pyprideap/validators/olink_explore.py

18. src/pyprideap/validators/olink_target.py ✨ Enhancement +20/-0

Olink Target validator extending Explore

src/pyprideap/validators/olink_target.py

19. src/pyprideap/validators/somascan.py ✨ Enhancement +87/-0

SomaScan RFU and metadata validation rules

src/pyprideap/validators/somascan.py

20. tests/conftest.py 🧪 Tests +30/-0

Pytest fixtures for test data paths

tests/conftest.py

21. tests/create_fixtures.py 🧪 Tests +51/-0

Fixture generation script for test data

tests/create_fixtures.py

22. tests/data/olink_sample.parquet 🧪 Tests +0/-0

Binary parquet test fixture for Olink

tests/data/olink_sample.parquet

23. tests/data/olink_sample.xlsx 🧪 Tests +0/-0

Binary XLSX test fixture for Olink

tests/data/olink_sample.xlsx

24. tests/data/somascan_sample.adat 🧪 Tests +16/-0

Text ADAT test fixture for SomaScan

tests/data/somascan_sample.adat

25. tests/test_core.py 🧪 Tests +42/-0

Unit tests for core data structures

tests/test_core.py

26. tests/test_integration.py 🧪 Tests +39/-0

End-to-end integration tests for API

tests/test_integration.py

27. tests/test_pride.py 🧪 Tests +63/-0

Unit tests for PRIDE API client

tests/test_pride.py

28. tests/test_readers.py 🧪 Tests +151/-0

Unit tests for all file format readers

tests/test_readers.py

29. tests/test_registry.py 🧪 Tests +52/-0

Tests for format detection and routing

tests/test_registry.py

30. tests/test_stats.py 🧪 Tests +83/-0

Unit tests for statistics computation

tests/test_stats.py

31. tests/test_validators.py 🧪 Tests +240/-0

Unit tests for all validators

tests/test_validators.py

32. tests/__init__.py Additional files +0/-0

...

tests/init.py

qodo-code-review · 2026-03-11T10:29:42Z

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~SomaScan CSV always invalid~~ ☑ 🐞 Bug ✓ Correctness

Description

read_somascan_csv() populates ds.features with only a SeqId column, but SomaScanValidator requires
UniProt/Target/Dilution as well, so validate() will always emit missing_feature_column errors for
any SomaScan CSV read through this API.

Code

src/pyprideap/readers/somascan_csv.py[R17-25]

+    seq_cols = [c for c in df.columns if c.startswith("SeqId.")]
+    meta_cols = [c for c in df.columns if c not in seq_cols]
+
+    samples = df[meta_cols].reset_index(drop=True)
+    expression = df[seq_cols].astype(float).reset_index(drop=True)
+
+    seq_ids = [c.replace("SeqId.", "") for c in seq_cols]
+    features = pd.DataFrame({"SeqId": seq_ids})
+

Evidence
The SomaScan CSV reader constructs a features DataFrame containing only SeqId, while the validator
enforces additional required feature columns (UniProt, Target, Dilution). The provided SomaScan CSV
fixture also lacks those columns, demonstrating that the current read→validate workflow is
guaranteed to fail schema validation for this format.
src/pyprideap/readers/somascan_csv.py[17-25]
src/pyprideap/validators/somascan.py[8-10]
src/pyprideap/validators/somascan.py[34-43]
tests/data/somascan_sample.csv[1-4]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`read_somascan_csv()` produces `AffinityDataset.features` with only `SeqId`, but `SomaScanValidator` requires `UniProt`, `Target`, and `Dilution`. This makes `pyprideap.read()` + `pyprideap.validate()` unusable for SomaScan CSV inputs.
## Issue Context
SomaScan ADAT parsing provides rich feature metadata, but the SomaScan CSV format in fixtures contains only `SeqId.*` expression columns and sample metadata. The validator is currently keyed only by `Platform.SOMASCAN`, so both ADAT and CSV go through the same strict schema checks.
## Fix Focus Areas
- src/pyprideap/readers/somascan_csv.py[17-32]
- src/pyprideap/validators/somascan.py[8-45]
- src/pyprideap/validators/__init__.py[7-19]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~CSV/XLSX mis-detection~~ ☑ 🐞 Bug ✓ Correctness

Description

detect_format() uses set intersection for CSV/XLSX Olink detection, so any file containing just one
of {SampleID, OlinkID, NPX} can be misclassified as Olink and routed to the wrong reader, leading to
crashes or incorrect parsing.

Code

src/pyprideap/readers/registry.py[R37-50]

+    if suffix == ".csv":
+        df_head = pd.read_csv(path, nrows=1)
+        cols = set(df_head.columns)
+        has_seqid_cols = any(c.startswith("SeqId.") for c in cols)
+        if has_seqid_cols or _SOMASCAN_MARKER_COLS.intersection(cols):
+            return "somascan_csv"
+        if _OLINK_MARKER_COLS.intersection(cols):
+            return "olink_csv"
+
+    if suffix == ".xlsx":
+        df_head = pd.read_excel(path, nrows=1)
+        cols = set(df_head.columns)
+        if _OLINK_MARKER_COLS.intersection(cols):
+            return "olink_xlsx"

Evidence
Parquet detection requires all Olink marker columns via issubset, but CSV/XLSX detection only checks
for any overlap with marker columns. Because intersection is truthy for a single shared column
(e.g., 'SampleID'), non-Olink inputs can be misrouted to Olink readers.
src/pyprideap/readers/registry.py[27-33]
src/pyprideap/readers/registry.py[37-45]
src/pyprideap/readers/registry.py[46-50]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`detect_format()` uses `_OLINK_MARKER_COLS.intersection(cols)` for CSV/XLSX detection, which is true if *any* one marker column is present. This can mis-route non-Olink files (e.g., any dataset with `SampleID`) to an Olink reader.
## Issue Context
Parquet detection already uses `_OLINK_MARKER_COLS.issubset(cols)`, so CSV/XLSX behavior is inconsistent and weaker.
## Fix Focus Areas
- src/pyprideap/readers/registry.py[15-52]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Olink schema KeyErrors~~ ☑ 🐞 Bug ⛯ Reliability

Description

The Olink readers hard-reference SampleID/OlinkID/NPX in drop_duplicates() and pivot_table() without
checking those columns exist, so malformed inputs fail with opaque pandas KeyErrors rather than a
clear, actionable schema error.

Code

src/pyprideap/readers/olink_csv.py[R20-32]

+    sample_cols = [c for c in df.columns if c in _SAMPLE_COLS]
+    samples = df[sample_cols].drop_duplicates(subset=["SampleID"]).reset_index(drop=True)
+
+    feature_cols = [c for c in df.columns if c in _FEATURE_COLS]
+    features = df[feature_cols].drop_duplicates(subset=["OlinkID"]).reset_index(drop=True)
+
+    expression = df.pivot_table(
+        index="SampleID",
+        columns="OlinkID",
+        values="NPX",
+        aggfunc="first",
+    )
+    expression = expression.reindex(samples["SampleID"].values).reset_index(drop=True)

Evidence
Each reader first builds sample_cols/feature_cols by filtering known columns, but then
unconditionally uses subset=["SampleID"], subset=["OlinkID"], and
values="NPX"/index="SampleID"/columns="OlinkID". If any of those required columns are missing,
pandas will raise at runtime.
src/pyprideap/readers/olink_csv.py[20-31]
src/pyprideap/readers/olink_parquet.py[20-33]
src/pyprideap/readers/olink_xlsx.py[20-27]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The Olink readers assume `SampleID`, `OlinkID`, and `NPX` exist and will throw pandas `KeyError` when inputs are malformed. This produces low-quality error messages and makes debugging/user experience poor.
## Issue Context
Each reader filters columns into `sample_cols`/`feature_cols`, but still hard-references required columns later.
## Fix Focus Areas
- src/pyprideap/readers/olink_csv.py[18-33]
- src/pyprideap/readers/olink_parquet.py[18-33]
- src/pyprideap/readers/olink_xlsx.py[18-28]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. ~~Version import crashes~~ ☑ 🐞 Bug ⛯ Reliability

Description

Importing pyprideap from a source checkout can crash because __init__.py evaluates
importlib.metadata.version("pyprideap") at import time, which raises PackageNotFoundError when the
distribution metadata is unavailable.

Code

src/pyprideap/init.py[R3-12]

+from importlib.metadata import version
+
+from pyprideap.core import AffinityDataset, Level, Platform, ValidationResult
+from pyprideap.pride import PrideClient
+from pyprideap.readers.registry import read
+from pyprideap.stats import DatasetStats, compute_stats
+from pyprideap.validators import validate
+
+__version__ = version("pyprideap")
+

Evidence

The package computes __version__ via importlib.metadata.version() unconditionally during import,
which depends on installed distribution metadata rather than the source tree being on PYTHONPATH.

src/pyprideap/init.py[3-12]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`__version__ = version(&amp;amp;quot;pyprideap&amp;amp;quot;)` can raise `PackageNotFoundError` when importing from a source checkout without an installed distribution.
## Issue Context
CI installs the package (`pip install -e`), but developers/users may import directly from a checkout.
## Fix Focus Areas
- src/pyprideap/__init__.py[3-12]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

coderabbitai · 2026-03-11T10:29:43Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new pyprideap package surface: core datamodels, many format readers and registry, LOD analysis, filtering, stats, validators, QC compute/render/report, a PRIDE API client, top-level exports, CI config, tests, and test fixtures.

Changes

Cohort / File(s)	Summary
Project config & docs `.github/workflows/ci.yml`, `.gitignore`, `pyproject.toml`, `README.md`	Adds CI jobs (lint/test/typecheck), expands README, adds pyproject.toml with dependencies/tooling, and updates .gitignore entries.
Top-level API & core models `src/pyprideap/__init__.py`, `src/pyprideap/core.py`	Exports version and many re-exports via all; adds Platform/Level enums, ValidationResult and AffinityDataset dataclasses.
PRIDE client `src/pyprideap/pride.py`	New PrideClient with session management, project fetch, file listing, and download-URL extraction.
Readers & registry `src/pyprideap/readers/...`, `src/pyprideap/readers/registry.py`	Adds readers for Olink (CSV, Parquet, XLSX) and SomaScan (ADAT, CSV) plus detect_format and read dispatcher.
Filtering & LOD analysis `src/pyprideap/filtering.py`, `src/pyprideap/lod.py`	Adds filter_controls/filter_qc utilities and extensive LOD API (NCLOD/FIXED/REPORTED), loaders, compute_lod_stats, get_valid_proteins, LodStats and LodMethod.
Statistics `src/pyprideap/stats.py`	Adds DatasetStats dataclass and compute_stats() for dataset summaries and distributions.
Validators `src/pyprideap/validators/...`	Introduces validator framework and platform validators (OlinkExplore, OlinkTarget, SomaScan) with a validate() dispatcher.
QC compute/render/report `src/pyprideap/qc/...`	Adds QC compute dataclasses/functions, Plotly renderers, and qc_report HTML generator with integrated plots and metadata.
Tests, fixtures & data `tests/`, `tests/data/`	Adds extensive unit and integration tests, fixture generator script, and sample data files covering readers, registry, validators, LOD, filtering, stats, PRIDE client, and QC code.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant User
    participant Registry as readers.registry.read()
    participant Detect as readers.registry.detect_format()
    participant Reader as Reader (CSV/Parquet/ADAT/XLSX)
    end
    rect rgba(200,255,200,0.5)
    participant Dataset as AffinityDataset
    participant Validator as validators.validate()
    participant LOD as lod.compute_lod_stats()
    participant Stats as stats.compute_stats()
    end

    User->>Registry: read(path)
    Registry->>Detect: detect_format(path)
    Detect-->>Registry: format_type
    Registry->>Reader: call reader_func(path)
    Reader->>Reader: parse file, assemble samples/features/expression
    Reader-->>Dataset: return AffinityDataset
    Registry-->>User: dataset

    User->>Validator: validate(dataset)
    Validator->>Validator: run platform checks
    Validator-->>User: list[ValidationResult]

    User->>LOD: compute_lod_stats(dataset)
    LOD->>LOD: derive LODs & per-sample rates
    LOD-->>User: LodStats

    User->>Stats: compute_stats(dataset)
    Stats->>Stats: aggregate metrics
    Stats-->>User: DatasetStats

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through files with nimble paws,

Readers and LODs and validators' laws,
Stats counted carrots, QC sang along,
CI kept the tests marching strong,
A cheerful rabbit nibbles—code well done.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and contains a typo ('structure;l'). It uses non-descriptive phrasing that does not clearly convey the main purpose of the changeset despite being an initial commit.	Revise the title to be clear and specific, e.g., 'Initial library implementation with core readers, validators, and PRIDE client' or 'Set up pyprideap: add readers, validators, statistics, and PRIDE integration'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

src/pyprideap/readers/somascan_csv.py

src/pyprideap/readers/registry.py

src/pyprideap/readers/olink_csv.py

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (12)

pyproject.toml (1)

5-18: Consider adding project metadata for PyPI publication.

The core configuration looks solid. For eventual PyPI publication, consider adding author information, classifiers, and project URLs:

authors = [
    {name = "Your Name", email = "email@example.com"}
]
classifiers = [
    "Development Status :: 3 - Alpha",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]
[project.urls]
Homepage = "https://github.com/PRIDE-Archive/pyprideap"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 5 - 18, Add PyPI metadata under the existing
[project] table: populate authors (e.g., authors = [{name = "Your Name", email =
"you@example.com"}]), add a classifiers array (include Development Status and
supported Python versions like "Programming Language :: Python ::
3.10"/3.11/3.12), and add a [project.urls] section with useful links (Homepage,
Source, Documentation) so the package metadata for pyprideap is complete for
PyPI publication; update the version/description if needed but keep the existing
dependencies intact.

src/pyprideap/pride.py (1)

11-15: Consider adding context manager support for proper session cleanup.

The requests.Session is created but never explicitly closed, which could lead to resource leaks in long-running applications.

💡 Optional: Add context manager support

 class PrideClient:
     def __init__(self, base_url: str = _BASE_URL, timeout: int = _DEFAULT_TIMEOUT):
         self.base_url = base_url
         self.timeout = timeout
         self._session = requests.Session()
+
+    def close(self) -> None:
+        self._session.close()
+
+    def __enter__(self) -> "PrideClient":
+        return self
+
+    def __exit__(self, *args: object) -> None:
+        self.close()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/pride.py` around lines 11 - 15, The PrideClient currently
creates a requests.Session in __init__ and never closes it; add explicit session
cleanup by implementing context manager support (define __enter__ to return self
and __exit__ to call self.close()) plus a close() method that calls
self._session.close(); ensure existing callers can also call close() directly
and consider adding a __del__ that calls close() as a fallback. Reference the
PrideClient class, its __init__, the self._session attribute, and add methods
__enter__, __exit__, and close to manage lifecycle.

.github/workflows/ci.yml (2)

10-19: Consider adding pip caching to speed up CI.

Adding pip caching can significantly reduce CI time:

💡 Optional: Add caching to workflow

       - uses: actions/setup-python@v5
         with:
           python-version: "3.12"
+          cache: 'pip'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 10 - 19, Add pip caching to the "lint"
job by caching pip's wheel/cache directory so subsequent runs skip
re-downloading ruff; modify the steps around uses: actions/setup-python@v5 and
the run: pip install ruff command to include an actions/cache step (e.g., cache
key based on runner, python-version and a static ruff-version or requirements
hash) that caches ~/.cache/pip (or the platform-appropriate pip cache path)
before installing, and restore that cache so the existing run: pip install ruff
becomes fast and CI time is reduced.

23-32: Consider adding fail-fast: false to ensure all Python versions complete their tests.

This allows each version to run independently, so you can see test results across all Python versions even if one fails:

Example change

     strategy:
+      fail-fast: false
       matrix:
         python-version: ["3.10", "3.11", "3.12", "3.13"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 23 - 32, Update the GitHub Actions job
strategy to disable fast-fail so all matrix entries run: under the existing
`strategy` -> `matrix` configuration referencing `python-version`, add
`fail-fast: false` (i.e., ensure `strategy` contains `matrix:` and `fail-fast:
false`) so tests for "python-version" 3.10–3.13 all complete even if one fails.

src/pyprideap/validators/olink_target.py (1)

7-20: Consider simplifying rule prefix handling.

The _rule_prefix class attribute is only used within validate(). Since the replacement logic is tightly coupled to this single method, it could be simplified.

However, the current approach works correctly and allows for potential subclass customization. The implementation properly rewrites all rule strings from the parent validator.

♻️ Optional: Simplify if subclass customization is not needed

 class OlinkTargetValidator(OlinkExploreValidator):
-    _rule_prefix = "olink_target"
-
     def validate(self, dataset: AffinityDataset) -> list[ValidationResult]:
         results = super().validate(dataset)
         return [
             ValidationResult(
                 level=r.level,
-                rule=r.rule.replace("olink.", f"{self._rule_prefix}.", 1),
+                rule=r.rule.replace("olink.", "olink_target.", 1),
                 message=r.message,
                 details=r.details,
             )
             for r in results
         ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/validators/olink_target.py` around lines 7 - 20, The
class-level attribute _rule_prefix is only used inside
OlinkTargetValidator.validate; simplify by removing the _rule_prefix attribute
and use a local variable or inline string in validate instead: update
OlinkTargetValidator to delete the _rule_prefix declaration and change the
replace call in validate to r.rule.replace("olink.", "olink_target.", 1) (or
assign prefix = "olink_target" at the top of validate and use f"{prefix}.")
while keeping the existing mapping logic that constructs ValidationResult from
super().validate(dataset).

tests/test_validators.py (1)

147-176: Consider adding a test to verify rule prefix rewriting.

The tests validate that OlinkTargetValidator inherits validation behavior from OlinkExploreValidator, but they don't explicitly verify that rule names are rewritten from olink. to olink_target. prefix.

💚 Optional: Add test for rule prefix

def test_rule_prefix_rewriting(self):
    ds = _make_olink_target_dataset(
        features=pd.DataFrame({"OlinkID": ["OID1"]}),
    )
    results = OlinkTargetValidator().validate(ds)
    errors = [r for r in results if r.level == Level.ERROR]
    # Verify rules use olink_target prefix, not olink
    assert all(r.rule.startswith("olink_target.") for r in errors)
    assert not any(r.rule.startswith("olink.") for r in errors)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_validators.py` around lines 147 - 176, Add a unit test that
ensures OlinkTargetValidator rewrites inherited rule names from the olink.
prefix to olink_target.; create a new test method (e.g.,
test_rule_prefix_rewriting) inside TestOlinkTargetValidator that builds a
minimal dataset via _make_olink_target_dataset, runs
OlinkTargetValidator().validate(ds), filters error-level results using Level,
and asserts that every error.rule starts with "olink_target." and that none
start with "olink."; this verifies the prefix rewriting behavior for inherited
rules.

tests/test_registry.py (1)

35-52: Consider adding a test for reading SomaScan CSV files.

TestDetectFormat includes test_somascan_csv_detected, but TestAutoRead lacks a corresponding test_read_somascan_csv. For consistency and completeness, consider adding this test case to verify the auto-read functionality for SomaScan CSV format.

💡 Suggested test addition

     def test_read_xlsx(self, olink_xlsx_path):
         ds = read(olink_xlsx_path)
         assert isinstance(ds, AffinityDataset)
         assert ds.platform == Platform.OLINK_EXPLORE
+
+    def test_read_somascan_csv(self, somascan_csv_path):
+        ds = read(somascan_csv_path)
+        assert isinstance(ds, AffinityDataset)
+        assert ds.platform == Platform.SOMASCAN

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_registry.py` around lines 35 - 52, Add a new test method
test_read_somascan_csv to the TestAutoRead class that calls
read(somascan_csv_path), asserts the returned object is the expected dataset
type (e.g., AffinityDataset if applicable) and that ds.platform ==
Platform.SOMASCAN; reference the existing read() call pattern and the
somascan_csv_path fixture used in TestDetectFormat to mirror other tests (e.g.,
test_read_somascan_adat) for consistency.

tests/test_pride.py (1)

36-43: Consider using a more specific exception type in the mock.

The test mocks raise_for_status.side_effect with a generic Exception("404 Not Found"). The requests library raises requests.exceptions.HTTPError for failed status codes. Using the correct exception type would make the test more realistic.

💡 Suggested improvement

+from requests.exceptions import HTTPError
+
     def test_get_project_not_found(self):
         mock_response = MagicMock()
         mock_response.status_code = 404
-        mock_response.raise_for_status.side_effect = Exception("404 Not Found")
+        mock_response.raise_for_status.side_effect = HTTPError("404 Not Found")
         with patch("pyprideap.pride.requests.Session.get", return_value=mock_response):
             client = PrideClient()
-            with pytest.raises(Exception, match="404"):
+            with pytest.raises(HTTPError, match="404"):
                 client.get_project("PAD999999")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_pride.py` around lines 36 - 43, The test
test_get_project_not_found should mock the same exception type Requests would
raise: replace the generic Exception thrown from
mock_response.raise_for_status.side_effect with requests.exceptions.HTTPError so
the behavior matches real requests behavior; update the test to import requests
(or reference requests.exceptions.HTTPError) and set
mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError("404
Not Found"), keeping the patch on pyprideap.pride.requests.Session.get and the
existing pytest.raises assertion targeting the HTTPError message.

src/pyprideap/readers/olink_csv.py (2)

18-32: Add validation for required columns before processing.

The code assumes columns SampleID, OlinkID, and NPX exist without validation. If any are missing, pivot_table or drop_duplicates will raise cryptic errors. Adding upfront validation would provide clearer error messages.

💡 Suggested validation

     df = pd.read_csv(path)
 
+    required_cols = {"SampleID", "OlinkID", "NPX"}
+    missing = required_cols - set(df.columns)
+    if missing:
+        raise ValueError(f"Missing required columns: {missing}")
+
     sample_cols = [c for c in df.columns if c in _SAMPLE_COLS]
     samples = df[sample_cols].drop_duplicates(subset=["SampleID"]).reset_index(drop=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/olink_csv.py` around lines 18 - 32, After reading the
CSV into df, validate that the required columns "SampleID", "OlinkID", and "NPX"
are present before any downstream operations (before computing
sample_cols/feature_cols or calling df.pivot_table). If any are missing, raise a
clear ValueError listing the missing columns so callers get an actionable
message; place this check immediately after df = pd.read_csv(path) and before
usage of sample_cols, feature_cols, samples, features, or expression.

26-31: Consider logging or warning when aggfunc="first" silently resolves duplicates.

Using aggfunc="first" in pivot_table silently picks the first value when there are duplicate (SampleID, OlinkID) combinations. This could mask data quality issues. Consider either validating uniqueness beforehand or logging when duplicates are encountered.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/olink_csv.py` around lines 26 - 31, The pivot uses
df.pivot_table(..., aggfunc="first") which silently drops duplicate (SampleID,
OlinkID) rows; before calling df.pivot_table (the expression variable), detect
duplicates with df.duplicated(subset=["SampleID","OlinkID"], keep=False), and if
any are found emit a warning/log (include count and a small sample of offending
rows) or optionally raise; then either deduplicate explicitly (e.g., groupby +
aggregate) or proceed to pivot—update the code around df.pivot_table to perform
this check and logging so duplicates are visible instead of silently ignored.

tests/test_integration.py (1)

6-12: Consider adding assertions on validation results content.

The test checks that results is a list but doesn't verify its contents. Integration tests would be more valuable if they asserted on expected validation outcomes (e.g., no errors for valid test data, or specific expected warnings).

💡 Example enhancement

     results = pyprideap.validate(dataset)
     stats = pyprideap.compute_stats(dataset)
     assert stats.n_samples > 0
     assert stats.n_features > 0
     assert isinstance(results, list)
+    # Verify no critical validation errors for valid test data
+    from pyprideap.core import Level
+    critical_errors = [r for r in results if r.level == Level.ERROR]
+    assert len(critical_errors) == 0, f"Unexpected validation errors: {critical_errors}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_integration.py` around lines 6 - 12, The test currently only
asserts that results is a list; enhance it to verify expected validation
outcomes by checking that results contains no errors for this valid fixture:
after calling pyprideap.validate(dataset) assert either that results is empty or
more robustly assert that no entry in results has severity 'error' (e.g., assert
not any(r.get('severity') == 'error' for r in results)). Also consider adding an
assertion on results structure (e.g., each item has keys like 'message' or
'severity') to ensure the validation output format from pyprideap.validate
remains stable.

src/pyprideap/readers/registry.py (1)

37-44: Potential fall-through issue: unrecognized CSV files silently fail.

If a .csv file doesn't match SomaScan or Olink marker columns, the function falls through to the .xlsx check and eventually raises ValueError. However, this path is unclear from the code structure. Consider adding an explicit else or comment to clarify this intentional fall-through behavior.

💡 Suggested clarification

     if suffix == ".csv":
         df_head = pd.read_csv(path, nrows=1)
         cols = set(df_head.columns)
         has_seqid_cols = any(c.startswith("SeqId.") for c in cols)
         if has_seqid_cols or _SOMASCAN_MARKER_COLS.intersection(cols):
             return "somascan_csv"
         if _OLINK_MARKER_COLS.intersection(cols):
             return "olink_csv"
+        # Fall through to raise ValueError if no markers matched

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/registry.py` around lines 37 - 44, The CSV branch
currently inspects df_head and checks _SOMASCAN_MARKER_COLS and
_OLINK_MARKER_COLS but then silently falls through if no match; make this
explicit by either adding an else to return a sentinel (e.g., None) or, to
preserve current behavior, add a clear comment stating the fall-through is
intentional so later .xlsx handling/ValueError will apply; update the block
around suffix, pd.read_csv, df_head, has_seqid_cols, _SOMASCAN_MARKER_COLS and
_OLINK_MARKER_COLS to include that explicit else or comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pyprideap/__init__.py`:
- Around line 3-11: Wrap the importlib.metadata.version call in a try/except to
avoid raising PackageNotFoundError when the package metadata is missing: import
PackageNotFoundError from importlib.metadata and replace the direct assignment
to __version__ with a guarded assignment that sets __version__ =
version("pyprideap") inside the try block and sets a safe fallback (e.g.,
"0.0.0" or None) in the except PackageNotFoundError block; update the top-level
symbols (__version__, version, PackageNotFoundError) so importing pyprideap from
a source tree doesn't fail.

In `@src/pyprideap/pride.py`:
- Around line 37-40: The loop that sets urls[name] uses loc["value"] without
checking the key, which can raise KeyError; update the block that iterates over
locations (the for loc in locations loop) to safely access the value by either
checking "value" in loc or using loc.get("value") and only assign urls[name]
when a non-None value is present (and keep the existing break behavior after
assignment).

In `@src/pyprideap/readers/olink_parquet.py`:
- Around line 23-32: The expression matrix created by pivot_table is left with
columns in pivot order, causing mismatch with features; after building features
= df[feature_cols].drop_duplicates(subset=["OlinkID"]).reset_index(drop=True),
reindex the expression's columns to match features["OlinkID"] order (e.g. use
expression.reindex(columns=features["OlinkID"].values]) to align columns with
the features DataFrame) and then reset_index(drop=True) as done for the sample
axis so that expression columns and features["OlinkID"] have the same order and
indexing.

In `@src/pyprideap/readers/olink_xlsx.py`:
- Around line 23-27: The pivot_table call sorts OlinkID columns and can misalign
with the first-seen feature order in feature_cols/features; to fix, after
building expression (pivot_table and reindexing rows) explicitly reindex its
columns to match the feature order from features["OlinkID"] (or the
feature_cols-derived OlinkID order) so that expression.columns align with
features rows (use expression.reindex(columns=features["OlinkID"].values] or
equivalent) before returning/resetting index.

In `@src/pyprideap/readers/somascan_csv.py`:
- Around line 17-24: The features DataFrame only contains "SeqId" which breaks
SomaScanValidator expecting {"SeqId","UniProt","Target","Dilution"} and the
expression columns keep the "SeqId." prefix while seq_ids are stripped; update
the logic around seq_cols/seq_ids/features to (1) populate features with the
required columns by extracting UniProt/Target/Dilution from the original df if
present or else adding these columns with NaN/None defaults, and (2) make
expression column names and features.SeqId consistent by either stripping the
"SeqId." prefix from expression column names or keeping the prefix in
features.SeqId so that seq_cols, seq_ids, expression, and features all refer to
the same identifiers (see symbols: seq_cols, seq_ids, expression, features,
SomaScanValidator).

In `@tests/create_fixtures.py`:
- Around line 30-46: The test fixture function create_olink_xlsx is missing the
PlateQC column required by the olink xlsx reader's _SAMPLE_COLS; update the
DataFrame constructed in create_olink_xlsx to include a "PlateQC" column (e.g.,
values like "PASS" or appropriate per-row QC statuses) so the produced
tests/data/olink_sample.xlsx contains PlateQC for each row and matches the
reader expectation in src/pyprideap/readers/olink_xlsx.py.

---

Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 10-19: Add pip caching to the "lint" job by caching pip's
wheel/cache directory so subsequent runs skip re-downloading ruff; modify the
steps around uses: actions/setup-python@v5 and the run: pip install ruff command
to include an actions/cache step (e.g., cache key based on runner,
python-version and a static ruff-version or requirements hash) that caches
~/.cache/pip (or the platform-appropriate pip cache path) before installing, and
restore that cache so the existing run: pip install ruff becomes fast and CI
time is reduced.
- Around line 23-32: Update the GitHub Actions job strategy to disable fast-fail
so all matrix entries run: under the existing `strategy` -> `matrix`
configuration referencing `python-version`, add `fail-fast: false` (i.e., ensure
`strategy` contains `matrix:` and `fail-fast: false`) so tests for
"python-version" 3.10–3.13 all complete even if one fails.

In `@pyproject.toml`:
- Around line 5-18: Add PyPI metadata under the existing [project] table:
populate authors (e.g., authors = [{name = "Your Name", email =
"you@example.com"}]), add a classifiers array (include Development Status and
supported Python versions like "Programming Language :: Python ::
3.10"/3.11/3.12), and add a [project.urls] section with useful links (Homepage,
Source, Documentation) so the package metadata for pyprideap is complete for
PyPI publication; update the version/description if needed but keep the existing
dependencies intact.

In `@src/pyprideap/pride.py`:
- Around line 11-15: The PrideClient currently creates a requests.Session in
__init__ and never closes it; add explicit session cleanup by implementing
context manager support (define __enter__ to return self and __exit__ to call
self.close()) plus a close() method that calls self._session.close(); ensure
existing callers can also call close() directly and consider adding a __del__
that calls close() as a fallback. Reference the PrideClient class, its __init__,
the self._session attribute, and add methods __enter__, __exit__, and close to
manage lifecycle.

In `@src/pyprideap/readers/olink_csv.py`:
- Around line 18-32: After reading the CSV into df, validate that the required
columns "SampleID", "OlinkID", and "NPX" are present before any downstream
operations (before computing sample_cols/feature_cols or calling
df.pivot_table). If any are missing, raise a clear ValueError listing the
missing columns so callers get an actionable message; place this check
immediately after df = pd.read_csv(path) and before usage of sample_cols,
feature_cols, samples, features, or expression.
- Around line 26-31: The pivot uses df.pivot_table(..., aggfunc="first") which
silently drops duplicate (SampleID, OlinkID) rows; before calling df.pivot_table
(the expression variable), detect duplicates with
df.duplicated(subset=["SampleID","OlinkID"], keep=False), and if any are found
emit a warning/log (include count and a small sample of offending rows) or
optionally raise; then either deduplicate explicitly (e.g., groupby + aggregate)
or proceed to pivot—update the code around df.pivot_table to perform this check
and logging so duplicates are visible instead of silently ignored.

In `@src/pyprideap/readers/registry.py`:
- Around line 37-44: The CSV branch currently inspects df_head and checks
_SOMASCAN_MARKER_COLS and _OLINK_MARKER_COLS but then silently falls through if
no match; make this explicit by either adding an else to return a sentinel
(e.g., None) or, to preserve current behavior, add a clear comment stating the
fall-through is intentional so later .xlsx handling/ValueError will apply;
update the block around suffix, pd.read_csv, df_head, has_seqid_cols,
_SOMASCAN_MARKER_COLS and _OLINK_MARKER_COLS to include that explicit else or
comment.

In `@src/pyprideap/validators/olink_target.py`:
- Around line 7-20: The class-level attribute _rule_prefix is only used inside
OlinkTargetValidator.validate; simplify by removing the _rule_prefix attribute
and use a local variable or inline string in validate instead: update
OlinkTargetValidator to delete the _rule_prefix declaration and change the
replace call in validate to r.rule.replace("olink.", "olink_target.", 1) (or
assign prefix = "olink_target" at the top of validate and use f"{prefix}.")
while keeping the existing mapping logic that constructs ValidationResult from
super().validate(dataset).

In `@tests/test_integration.py`:
- Around line 6-12: The test currently only asserts that results is a list;
enhance it to verify expected validation outcomes by checking that results
contains no errors for this valid fixture: after calling
pyprideap.validate(dataset) assert either that results is empty or more robustly
assert that no entry in results has severity 'error' (e.g., assert not
any(r.get('severity') == 'error' for r in results)). Also consider adding an
assertion on results structure (e.g., each item has keys like 'message' or
'severity') to ensure the validation output format from pyprideap.validate
remains stable.

In `@tests/test_pride.py`:
- Around line 36-43: The test test_get_project_not_found should mock the same
exception type Requests would raise: replace the generic Exception thrown from
mock_response.raise_for_status.side_effect with requests.exceptions.HTTPError so
the behavior matches real requests behavior; update the test to import requests
(or reference requests.exceptions.HTTPError) and set
mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError("404
Not Found"), keeping the patch on pyprideap.pride.requests.Session.get and the
existing pytest.raises assertion targeting the HTTPError message.

In `@tests/test_registry.py`:
- Around line 35-52: Add a new test method test_read_somascan_csv to the
TestAutoRead class that calls read(somascan_csv_path), asserts the returned
object is the expected dataset type (e.g., AffinityDataset if applicable) and
that ds.platform == Platform.SOMASCAN; reference the existing read() call
pattern and the somascan_csv_path fixture used in TestDetectFormat to mirror
other tests (e.g., test_read_somascan_adat) for consistency.

In `@tests/test_validators.py`:
- Around line 147-176: Add a unit test that ensures OlinkTargetValidator
rewrites inherited rule names from the olink. prefix to olink_target.; create a
new test method (e.g., test_rule_prefix_rewriting) inside
TestOlinkTargetValidator that builds a minimal dataset via
_make_olink_target_dataset, runs OlinkTargetValidator().validate(ds), filters
error-level results using Level, and asserts that every error.rule starts with
"olink_target." and that none start with "olink."; this verifies the prefix
rewriting behavior for inherited rules.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 348a20d9-ebc1-4e1a-a5c1-1d90108116cd

📥 Commits

Reviewing files that changed from the base of the PR and between 46b0ee9 and 910ac2c.

⛔ Files ignored due to path filters (4)

tests/data/olink_sample.npx.csv is excluded by !**/*.csv
tests/data/olink_sample.parquet is excluded by !**/*.parquet
tests/data/olink_sample.xlsx is excluded by !**/*.xlsx
tests/data/somascan_sample.csv is excluded by !**/*.csv

📒 Files selected for processing (31)

.github/workflows/ci.yml
.gitignore
README.md
pyproject.toml
src/pyprideap/__init__.py
src/pyprideap/core.py
src/pyprideap/pride.py
src/pyprideap/readers/__init__.py
src/pyprideap/readers/olink_csv.py
src/pyprideap/readers/olink_parquet.py
src/pyprideap/readers/olink_xlsx.py
src/pyprideap/readers/registry.py
src/pyprideap/readers/somascan_adat.py
src/pyprideap/readers/somascan_csv.py
src/pyprideap/stats.py
src/pyprideap/validators/__init__.py
src/pyprideap/validators/base.py
src/pyprideap/validators/olink_explore.py
src/pyprideap/validators/olink_target.py
src/pyprideap/validators/somascan.py
tests/__init__.py
tests/conftest.py
tests/create_fixtures.py
tests/data/somascan_sample.adat
tests/test_core.py
tests/test_integration.py
tests/test_pride.py
tests/test_readers.py
tests/test_registry.py
tests/test_stats.py
tests/test_validators.py

src/pyprideap/__init__.py

src/pyprideap/pride.py

coderabbitai · 2026-03-11T10:38:24Z

src/pyprideap/pride.py

+            for loc in locations:
+                if loc.get("name") == "FTP Protocol":
+                    urls[name] = loc["value"]
+                    break


⚠️ Potential issue | 🟡 Minor

Guard against missing value key to prevent KeyError.

Line 39 accesses loc["value"] without verifying the key exists. If the API response lacks a value field for an FTP location, this will raise a KeyError.

🛡️ Proposed defensive fix

for loc in locations: if loc.get("name") == "FTP Protocol": - urls[name] = loc["value"] + if "value" in loc: + urls[name] = loc["value"] break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/pride.py` around lines 37 - 40, The loop that sets urls[name] uses loc["value"] without checking the key, which can raise KeyError; update the block that iterates over locations (the for loc in locations loop) to safely access the value by either checking "value" in loc or using loc.get("value") and only assign urls[name] when a non-None value is present (and keep the existing break behavior after assignment).

src/pyprideap/readers/olink_parquet.py

src/pyprideap/readers/olink_xlsx.py

src/pyprideap/readers/somascan_csv.py

tests/create_fixtures.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

src/pyprideap/__init__.py (1)

3-13: ⚠️ Potential issue | 🟠 Major

Guard __version__ for source-tree imports.

version("pyprideap") raises PackageNotFoundError when the package is imported without installed metadata, so import pyprideap can fail in local-dev and test setups.

Suggested fix

-from importlib.metadata import version
+from importlib.metadata import PackageNotFoundError, version
@@
-__version__ = version("pyprideap")
+try:
+    __version__ = version("pyprideap")
+except PackageNotFoundError:
+    __version__ = "0+unknown"

#!/bin/bash
python - <<'PY'
from pathlib import Path

text = Path("src/pyprideap/__init__.py").read_text()
print("has PackageNotFoundError guard:", "PackageNotFoundError" in text)
print('has direct version() assignment:', '__version__ = version("pyprideap")' in text)
PY

python - <<'PY'
from importlib.metadata import PackageNotFoundError, version

try:
    version("definitely-not-installed-package-123456")
except PackageNotFoundError as exc:
    print(type(exc).__name__)
PY

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/__init__.py` around lines 3 - 13, The module-level assignment
__version__ = version("pyprideap") can raise PackageNotFoundError when running
from the source tree; wrap the call to importlib.metadata.version in a
try/except that catches importlib.metadata.PackageNotFoundError (or the imported
PackageNotFoundError symbol) and set a safe fallback (e.g. "__version__" =
"0.0.0" or None) in the except branch so importing pyprideap in local-dev/tests
won't fail; update the top-level import/assignment involving version and
__version__ accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pyprideap/filtering.py`:
- Around line 43-45: The code applies a boolean Series (keep_mask) from
dataset.samples directly to dataset.expression which risks label-aligned
selection if indexes diverge; change those selections to positional indexing
using .iloc with the boolean array (e.g., replace dataset.expression[keep_mask]
with dataset.expression.iloc[keep_mask.to_numpy()] and do the same for any other
occurrence such as the selection at lines referencing keep_mask/samples) and
then call .reset_index(drop=True) as before.

In `@src/pyprideap/lod.py`:
- Around line 56-69: The code currently falls back to any control types when no
negative controls exist; change this to require negative controls only by
setting control_mask = is_negative (remove the fallback to is_control) and then
raise a ValueError if not is_negative.any() or if control_mask.sum() <
_MIN_CONTROLS_FOR_LOD; update the check around
compute_lod_stats()/get_valid_proteins to rely solely on control_mask (negative
samples) and keep the existing error message using _MIN_CONTROLS_FOR_LOD and
control_mask.sum() to report insufficient negatives.

In `@src/pyprideap/validators/olink_explore.py`:
- Around line 65-74: The _check_qc_consistency method currently skips sample
rows when there is no corresponding expression row, allowing misaligned datasets
to pass; update _check_qc_consistency (in class handling AffinityDataset) to
detect a mismatch between ds.samples and ds.expression lengths up front and fail
fast by returning a ValidationResult (or raising a ValidationError) indicating a
fatal row-count mismatch instead of silently continuing; specifically, add a
check comparing len(ds.samples) and len(ds.expression) at the start of
_check_qc_consistency and emit/return a clear error ValidationResult referencing
the dataset and the mismatch so downstream code stops processing misaligned
tables.

---

Duplicate comments:
In `@src/pyprideap/__init__.py`:
- Around line 3-13: The module-level assignment __version__ =
version("pyprideap") can raise PackageNotFoundError when running from the source
tree; wrap the call to importlib.metadata.version in a try/except that catches
importlib.metadata.PackageNotFoundError (or the imported PackageNotFoundError
symbol) and set a safe fallback (e.g. "__version__" = "0.0.0" or None) in the
except branch so importing pyprideap in local-dev/tests won't fail; update the
top-level import/assignment involving version and __version__ accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f31f39c-4945-440a-94b8-52ffaadf5309

📥 Commits

Reviewing files that changed from the base of the PR and between 910ac2c and bc32e82.

📒 Files selected for processing (6)

src/pyprideap/__init__.py
src/pyprideap/filtering.py
src/pyprideap/lod.py
src/pyprideap/validators/olink_explore.py
tests/test_filtering.py
tests/test_lod.py

coderabbitai · 2026-03-11T14:54:44Z

src/pyprideap/filtering.py

+    keep_mask = ~is_control
+    samples = dataset.samples[keep_mask].reset_index(drop=True)
+    expression = dataset.expression[keep_mask].reset_index(drop=True)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import pandas as pd samples = pd.DataFrame({"SampleType": ["Sample", "Control"]}, index=[100, 101]) expression = pd.DataFrame({"OID0": [1.0, 2.0]}, index=[0, 1]) mask = samples["SampleType"].eq("Sample") print("samples.index =", list(samples.index)) print("expression.index=", list(expression.index)) try: expression[mask] except Exception as exc: print(type(exc).__name__, str(exc)) print("positional fix:") print(expression.loc[mask.to_numpy()]) PY

Repository: PRIDE-Archive/pyprideap

Length of output: 185

🏁 Script executed:

cat -n src/pyprideap/filtering.py | head -80

Repository: PRIDE-Archive/pyprideap

Length of output: 2884

🏁 Script executed:

cat -n src/pyprideap/core.py | grep -A 20 "class AffinityDataset"

Repository: PRIDE-Archive/pyprideap

Length of output: 298

🏁 Script executed:

rg "def samples|def expression" src/pyprideap/core.py -A 5

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

rg "AffinityDataset(" src/pyprideap/ -B 2 -A 5 | head -100

Repository: PRIDE-Archive/pyprideap

Length of output: 149

🏁 Script executed:

find src/pyprideap -name "*.py" -exec grep -l "samples.*expression" {} \;

Repository: PRIDE-Archive/pyprideap

Length of output: 93

🏁 Script executed:

grep -n "\.index\|reset_index\|set_index" src/pyprideap/filtering.py

Repository: PRIDE-Archive/pyprideap

Length of output: 346

🏁 Script executed:

rg "read_csv|read_parquet|pd\.DataFrame" src/pyprideap/ -B 1 -A 1 | head -80

Repository: PRIDE-Archive/pyprideap

Length of output: 1926

🏁 Script executed:

find src/pyprideap -name "*.py" -exec grep -l "def __init__\|class.*:" {} \; | xargs ls -la

Repository: PRIDE-Archive/pyprideap

Length of output: 700

🏁 Script executed:

cat -n src/pyprideap/core.py | head -100

Repository: PRIDE-Archive/pyprideap

Length of output: 1016

🏁 Script executed:

cat -n src/pyprideap/readers/somascan_csv.py

Repository: PRIDE-Archive/pyprideap

Length of output: 1175

🏁 Script executed:

cat -n src/pyprideap/readers/somascan_adat.py

Repository: PRIDE-Archive/pyprideap

Length of output: 3326

🏁 Script executed:

cat -n src/pyprideap/readers/olink_csv.py

Repository: PRIDE-Archive/pyprideap

Length of output: 1608

🏁 Script executed:

cat -n src/pyprideap/readers/olink_parquet.py

Repository: PRIDE-Archive/pyprideap

Length of output: 1609

Use positional masking when subsetting expression.

Lines 45 and 74 apply a boolean Series from dataset.samples directly to dataset.expression. While this currently works because readers ensure matching RangeIndex(0, n) for both DataFrames, the code relies on an undocumented invariant. If the indexes ever diverge, pandas' label-aligned indexing will select wrong rows or raise errors.

Suggested fix

- expression = dataset.expression[keep_mask].reset_index(drop=True) + expression = dataset.expression.iloc[keep_mask.to_numpy()].reset_index(drop=True)

Also applies to: 73-74

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/filtering.py` around lines 43 - 45, The code applies a boolean Series (keep_mask) from dataset.samples directly to dataset.expression which risks label-aligned selection if indexes diverge; change those selections to positional indexing using .iloc with the boolean array (e.g., replace dataset.expression[keep_mask] with dataset.expression.iloc[keep_mask.to_numpy()] and do the same for any other occurrence such as the selection at lines referencing keep_mask/samples) and then call .reset_index(drop=True) as before.

coderabbitai · 2026-03-11T14:54:44Z

src/pyprideap/lod.py

+    # Further narrow to negative controls only
+    is_negative = (
+        dataset.samples["SampleType"]
+        .astype(str)
+        .str.lower()
+        .str.strip()
+        .isin({"negative", "negative control", "negative_control", "neg"})
+    )
+
+    control_mask = is_negative if is_negative.any() else is_control
+    if control_mask.sum() < _MIN_CONTROLS_FOR_LOD:
+        raise ValueError(
+            f"Need at least {_MIN_CONTROLS_FOR_LOD} control samples for LOD calculation, found {control_mask.sum()}"
+        )


⚠️ Potential issue | 🟠 Major

Require negative controls instead of falling back to any control.

The current fallback computes LOD from positive/QC/reference controls when no negative controls exist. That contradicts the docstring and can materially distort thresholds returned by compute_lod_stats() and get_valid_proteins().

Suggested fix

- control_mask = is_negative if is_negative.any() else is_control - if control_mask.sum() < _MIN_CONTROLS_FOR_LOD: + if not is_negative.any(): + raise ValueError("Negative control samples required for LOD calculation") + + control_mask = is_negative + if control_mask.sum() < _MIN_CONTROLS_FOR_LOD: raise ValueError( f"Need at least {_MIN_CONTROLS_FOR_LOD} control samples for LOD calculation, found {control_mask.sum()}" )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/lod.py` around lines 56 - 69, The code currently falls back to any control types when no negative controls exist; change this to require negative controls only by setting control_mask = is_negative (remove the fallback to is_control) and then raise a ValueError if not is_negative.any() or if control_mask.sum() < _MIN_CONTROLS_FOR_LOD; update the check around compute_lod_stats()/get_valid_proteins to rely solely on control_mask (negative samples) and keep the existing error message using _MIN_CONTROLS_FOR_LOD and control_mask.sum() to report insufficient negatives.

coderabbitai · 2026-03-11T14:54:44Z

src/pyprideap/validators/olink_explore.py

+    def _check_qc_consistency(self, ds: AffinityDataset) -> list[ValidationResult]:
+        if "SampleQC" not in ds.samples.columns:
+            return []
+        results = []
+        for idx in range(len(ds.samples)):
+            row = ds.samples.iloc[idx]
+            qc = row.get("SampleQC")
+            if idx >= len(ds.expression):
+                continue
+            expr_row = ds.expression.iloc[idx]


⚠️ Potential issue | 🟠 Major

Fail fast on sample/expression row-count mismatches.

Line 72 silently skips samples that have no corresponding expression row. That turns a corrupted dataset into a partially validated one and leaves downstream filtering/stats code operating on misaligned tables.

Suggested fix

def _check_qc_consistency(self, ds: AffinityDataset) -> list[ValidationResult]: if "SampleQC" not in ds.samples.columns: return [] + if len(ds.samples) != len(ds.expression): + return [ + ValidationResult( + level=Level.ERROR, + rule="olink.expression.sample_count_mismatch", + message="Sample table and expression matrix must have the same number of rows", + details={ + "n_samples": len(ds.samples), + "n_expression_rows": len(ds.expression), + }, + ) + ] results = [] for idx in range(len(ds.samples)): row = ds.samples.iloc[idx] qc = row.get("SampleQC") - if idx >= len(ds.expression): - continue expr_row = ds.expression.iloc[idx]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _check_qc_consistency(self, ds: AffinityDataset) -> list[ValidationResult]:

if "SampleQC" not in ds.samples.columns:

return []

results = []

for idx in range(len(ds.samples)):

row = ds.samples.iloc[idx]

qc = row.get("SampleQC")

if idx >= len(ds.expression):

continue

expr_row = ds.expression.iloc[idx]

def _check_qc_consistency(self, ds: AffinityDataset) -> list[ValidationResult]:

if "SampleQC" not in ds.samples.columns:

return []

if len(ds.samples) != len(ds.expression):

return [

ValidationResult(

level=Level.ERROR,

rule="olink.expression.sample_count_mismatch",

message="Sample table and expression matrix must have the same number of rows",

details={

"n_samples": len(ds.samples),

"n_expression_rows": len(ds.expression),

},

)

]

results = []

for idx in range(len(ds.samples)):

row = ds.samples.iloc[idx]

qc = row.get("SampleQC")

expr_row = ds.expression.iloc[idx]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/validators/olink_explore.py` around lines 65 - 74, The _check_qc_consistency method currently skips sample rows when there is no corresponding expression row, allowing misaligned datasets to pass; update _check_qc_consistency (in class handling AffinityDataset) to detect a mismatch between ds.samples and ds.expression lengths up front and fail fast by returning a ValidationResult (or raising a ValidationError) indicating a fatal row-count mismatch instead of silently continuing; specifically, add a check comparing len(ds.samples) and len(ds.expression) at the start of _check_qc_consistency and emit/return a clear error ValidationResult referencing the dataset and the mismatch so downstream code stops processing misaligned tables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nalysis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s, compute_cv_distribution, compute_detection_rate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

src/pyprideap/readers/somascan_csv.py (1)

24-30: ⚠️ Potential issue | 🟠 Major

Still unresolved: use the same SeqId values in features and expression.columns.

features["SeqId"] strips the SeqId. prefix, but the expression matrix keeps the prefixed names. Any downstream lookup keyed by SeqId will miss even though both structures describe the same assays.

Suggested fix

-    expression = df[seq_cols].astype(float).reset_index(drop=True)
-
     seq_ids = [c.replace("SeqId.", "") for c in seq_cols]
+    expression = df[seq_cols].astype(float).reset_index(drop=True)
+    expression.columns = seq_ids
     features = pd.DataFrame(
         {
             "SeqId": seq_ids,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/somascan_csv.py` around lines 24 - 30, The features
DataFrame is created with SeqId values that have the "SeqId." prefix stripped
(seq_ids) while the expression DataFrame columns remain as the original prefixed
names (expression.columns from seq_cols), causing mismatches; fix by making the
two representations identical — either keep the prefix when building features
(set features["SeqId"] to seq_cols or to the same strings used in
expression.columns) or normalize expression.columns to seq_ids so that
expression.columns and features["SeqId"] match (update where seq_ids, seq_cols,
features, and expression are handled).

src/pyprideap/readers/olink_parquet.py (1)

27-36: ⚠️ Potential issue | 🟠 Major

Still unresolved: reindex the feature axis before returning the dataset.

This has the same silent misalignment risk as the XLSX reader: features keeps first-seen OlinkID order, while the pivoted matrix can use a different column order. Because AffinityDataset does not validate alignment, callers can get the wrong metadata for each expression column.

Suggested fix

     feature_cols = [c for c in df.columns if c in _FEATURE_COLS]
     features = df[feature_cols].drop_duplicates(subset=["OlinkID"]).reset_index(drop=True)
 
     expression = df.pivot_table(
         index="SampleID",
         columns="OlinkID",
         values="NPX",
         aggfunc="first",
     )
-    expression = expression.reindex(samples["SampleID"].values).reset_index(drop=True)
+    expression = expression.reindex(
+        index=samples["SampleID"].values,
+        columns=features["OlinkID"].values,
+    ).reset_index(drop=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/olink_parquet.py` around lines 27 - 36, features and
the pivoted expression can be misaligned because features is built from
first-seen OlinkID order while expression.columns may be different; after
creating expression (the pivot_table) reorder/reindex the features DataFrame to
match expression.columns (use expression.columns / OlinkID values) so that the
features rows align with expression columns before returning or passing into
AffinityDataset (e.g., set features =
features.set_index("OlinkID").reindex(expression.columns).reset_index()).

src/pyprideap/readers/olink_xlsx.py (1)

27-31: ⚠️ Potential issue | 🟠 Major

Still unresolved: keep the feature axis aligned with features.

features preserves first-seen OlinkID order, but pivot_table() can return columns in a different order. On unsorted XLSX input, features.iloc[i] no longer describes expression.iloc[:, i], and AffinityDataset will return that mismatch unchecked.

Suggested fix

     feature_cols = [c for c in df.columns if c in _FEATURE_COLS]
     features = df[feature_cols].drop_duplicates(subset=["OlinkID"]).reset_index(drop=True)
 
     expression = df.pivot_table(index="SampleID", columns="OlinkID", values="NPX", aggfunc="first")
-    expression = expression.reindex(samples["SampleID"].values).reset_index(drop=True)
+    expression = expression.reindex(
+        index=samples["SampleID"].values,
+        columns=features["OlinkID"].values,
+    ).reset_index(drop=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/olink_xlsx.py` around lines 27 - 31, The feature axis
of expression can be out of order versus features because pivot_table may sort
columns differently; after creating features (feature_cols and features =
df[feature_cols]...), reindex the columns of expression to the OlinkID order in
features (use features["OlinkID"] as the column index ordering) so that
expression.iloc[:, i] corresponds to features.iloc[i]; perform this reindexing
on expression.columns (before any reset_index that drops column alignment) and
handle missing columns by allowing NaNs or filling as appropriate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 55-56: Add a pytest pythonpath entry under the existing
[tool.pytest.ini_options] so pytest can locate the package in a src/ layout;
specifically, update the [tool.pytest.ini_options] block (the same section that
contains testpaths) to include a pythonpath setting that points to "src" so
tests won't raise ModuleNotFoundError when running pytest directly.

In `@src/pyprideap/readers/olink_csv.py`:
- Around line 27-36: The expression DataFrame columns can be alphabetically
sorted by pivot_table(), causing misalignment with features (built via
feature_cols and features = df[feature_cols].drop_duplicates(subset=["OlinkID"])
), so after creating expression (the pivot_table and reindex by SampleID)
reindex or reorder expression.columns to match the features["OlinkID"] order;
update the olink_csv.py (and analogously olink_xlsx.py and olink_parquet.py) to
set expression = expression.reindex(columns=features["OlinkID"].values) or
otherwise reorder expression.columns to features' OlinkID sequence so positional
pairing is preserved.

In `@src/pyprideap/readers/registry.py`:
- Around line 40-42: detect_format currently classifies a file as "somascan_csv"
when _SOMASCAN_MARKER_COLS is a subset of cols, but read_somascan_csv expects
wide-format files with SeqId.* columns and will fail for long-format inputs;
change the detection logic in detect_format() to only return "somascan_csv" when
has_seqid_cols (i.e., any column startswith "SeqId.") is true, removing the
_SOMASCAN_MARKER_COLS.issubset(cols) branch so only wide-format files are routed
to read_somascan_csv(); if long-format SomaScan CSV support is needed later, add
a separate reader instead of reusing read_somascan_csv().

---

Duplicate comments:
In `@src/pyprideap/readers/olink_parquet.py`:
- Around line 27-36: features and the pivoted expression can be misaligned
because features is built from first-seen OlinkID order while expression.columns
may be different; after creating expression (the pivot_table) reorder/reindex
the features DataFrame to match expression.columns (use expression.columns /
OlinkID values) so that the features rows align with expression columns before
returning or passing into AffinityDataset (e.g., set features =
features.set_index("OlinkID").reindex(expression.columns).reset_index()).

In `@src/pyprideap/readers/olink_xlsx.py`:
- Around line 27-31: The feature axis of expression can be out of order versus
features because pivot_table may sort columns differently; after creating
features (feature_cols and features = df[feature_cols]...), reindex the columns
of expression to the OlinkID order in features (use features["OlinkID"] as the
column index ordering) so that expression.iloc[:, i] corresponds to
features.iloc[i]; perform this reindexing on expression.columns (before any
reset_index that drops column alignment) and handle missing columns by allowing
NaNs or filling as appropriate.

In `@src/pyprideap/readers/somascan_csv.py`:
- Around line 24-30: The features DataFrame is created with SeqId values that
have the "SeqId." prefix stripped (seq_ids) while the expression DataFrame
columns remain as the original prefixed names (expression.columns from
seq_cols), causing mismatches; fix by making the two representations identical —
either keep the prefix when building features (set features["SeqId"] to seq_cols
or to the same strings used in expression.columns) or normalize
expression.columns to seq_ids so that expression.columns and features["SeqId"]
match (update where seq_ids, seq_cols, features, and expression are handled).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9dc01a85-4667-4b9b-a88d-7ae79f74a129

📥 Commits

Reviewing files that changed from the base of the PR and between bc32e82 and e39bda9.

📒 Files selected for processing (12)

.github/workflows/ci.yml
pyproject.toml
src/pyprideap/__init__.py
src/pyprideap/pride.py
src/pyprideap/readers/olink_csv.py
src/pyprideap/readers/olink_parquet.py
src/pyprideap/readers/olink_xlsx.py
src/pyprideap/readers/registry.py
src/pyprideap/readers/somascan_csv.py
tests/test_integration.py
tests/test_registry.py
tests/test_validators.py

🚧 Files skipped from review as they are similar to previous changes (5)

tests/test_validators.py
.github/workflows/ci.yml
tests/test_integration.py
tests/test_registry.py
src/pyprideap/pride.py

coderabbitai · 2026-03-11T15:36:09Z

pyproject.toml

+[tool.pytest.ini_options]
+testpaths = ["tests"]


⚠️ Potential issue | 🟡 Minor

Add pythonpath for src-layout compatibility.

With a src/ layout, pytest won't find the pyprideap package without either an editable install or explicit path configuration. This causes ModuleNotFoundError when running pytest directly.

🔧 Proposed fix

[tool.pytest.ini_options] testpaths = ["tests"] +pythonpath = ["src"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[tool.pytest.ini_options]

testpaths = ["tests"]

[tool.pytest.ini_options]

testpaths = ["tests"]

pythonpath = ["src"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 55 - 56, Add a pytest pythonpath entry under the existing [tool.pytest.ini_options] so pytest can locate the package in a src/ layout; specifically, update the [tool.pytest.ini_options] block (the same section that contains testpaths) to include a pythonpath setting that points to "src" so tests won't raise ModuleNotFoundError when running pytest directly.

src/pyprideap/readers/olink_csv.py

coderabbitai · 2026-03-11T15:36:09Z

src/pyprideap/readers/registry.py

+        has_seqid_cols = any(c.startswith("SeqId.") for c in cols)
+        if has_seqid_cols or _SOMASCAN_MARKER_COLS.issubset(cols):
+            return "somascan_csv"


⚠️ Potential issue | 🟠 Major

Make SomaScan CSV detection match the reader’s actual input contract.

detect_format() accepts any CSV with SeqId and SomaId, but read_somascan_csv() only supports wide files with SeqId.* expression columns. That means read() can classify a file as somascan_csv and then fail immediately with No SeqId.* columns found.

Suggested fix

- if has_seqid_cols or _SOMASCAN_MARKER_COLS.issubset(cols): + if has_seqid_cols: return "somascan_csv"

If long-format SomaScan CSV support is intended later, it should get its own reader instead of being routed to the current wide-format implementation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

has_seqid_cols = any(c.startswith("SeqId.") for c in cols)

if has_seqid_cols or _SOMASCAN_MARKER_COLS.issubset(cols):

return "somascan_csv"

has_seqid_cols = any(c.startswith("SeqId.") for c in cols)

if has_seqid_cols:

return "somascan_csv"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/readers/registry.py` around lines 40 - 42, detect_format currently classifies a file as "somascan_csv" when _SOMASCAN_MARKER_COLS is a subset of cols, but read_somascan_csv expects wide-format files with SeqId.* columns and will fail for long-format inputs; change the detection logic in detect_format() to only return "somascan_csv" when has_seqid_cols (i.e., any column startswith "SeqId.") is true, removing the _SOMASCAN_MARKER_COLS.issubset(cols) branch so only wide-format files are routed to read_somascan_csv(); if long-format SomaScan CSV support is needed later, add a separate reader instead of reusing read_somascan_csv().

…gator Merged PCA, correlation, missing values, CV distribution, and detection rate into compute.py alongside distribution, QC summary, and LOD analysis. Added compute_all() that returns all applicable plot data for a dataset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… public API - qc_report() assembles all computed plot data into a standalone HTML file with Plotly CDN, table of contents, and platform summary header - compute_qc() (alias for compute_all) and qc_report() added to __all__ - Supports both Olink and SomaScan platform reports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion, missing frequency histogram, QC×LOD stacked bar, PCA with sample labels - Distribution: overlaid per-sample histograms instead of single combined - Missing Frequency: new plot — per-assay missing freq with 30% threshold - QC Summary: stacked bar crossing QC status with above/below LOD - PCA: use SampleQC for color when all same SampleType, show labels on points - Add MissingFrequencyData and QcLodSummaryData dataclasses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- PCA: set explicit height=500px on all figures to prevent div collapse when using to_html(include_plotlyjs=False) - Missing Frequency: use MissingFreq column from Olink features metadata (per-assay metric from Olink) instead of computing NaN rate from pivoted expression matrix which is always 0% - olink_csv reader: add MissingFreq to extracted feature columns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

qodo-code-review · 2026-03-12T08:03:48Z

CI Feedback 🧐

(Feedback updated until commit `5cb8d56`)

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: typecheck
Failed stage: Run mypy src/pyprideap/ --ignore-missing-imports [❌]
Failed test name: ""
Failure summary: The GitHub Action failed during the `mypy src/pyprideap/ --ignore-missing-imports` step because mypy reported 7 type-checking errors (exit code 1). Errors were reported in: - `src/pyprideap/lod.py`: - Line 127: returning `Any` from a function declared to return `Series[Any]` (`[no-any-return]`) - Lines 197 and 412: incompatible argument type passed to `numpy.tile` (`ndarray[...] \| ExtensionArray` not matching expected types) (`[arg-type]`) - Line 202: returning `Any` from a function declared to return `DataFrame \| Series[Any]` (`[no-any-return]`) - `src/pyprideap/qc/compute.py`: - Line 149: `list(...)` called with an `object` that does not match any `list` constructor overload (`[call-overload]`) - `src/pyprideap/qc/report.py`: - Line 252: attempting to call a function of unknown type (`[operator]`) - Line 256: accessing `.title` on an `object` that does not define that attribute (`[attr-defined]`)
Relevant error logs: 1: ##[group]Runner Image Provisioner 2: Hosted Compute Agent ... 273: Stored in directory: /tmp/pip-ephem-wheel-cache-3jlgn90r/wheels/c1/f8/d8/2821f504d00caaf76261bd56e020104d1119cf37a51a94cf2d 274: Successfully built pyprideap 275: Installing collected packages: urllib3, typing_extensions, threadpoolctl, six, ruff, pygments, pyarrow, pluggy, pathspec, packaging, numpy, narwhals, mypy_extensions, librt, joblib, iniconfig, idna, et-xmlfile, charset_normalizer, certifi, types-requests, scipy, requests, python-dateutil, pytest, plotly, pandas-stubs, openpyxl, mypy, scikit-learn, pandas, pyprideap 276: Successfully installed certifi-2026.2.25 charset_normalizer-3.4.5 et-xmlfile-2.0.0 idna-3.11 iniconfig-2.3.0 joblib-1.5.3 librt-0.8.1 mypy-1.19.1 mypy_extensions-1.1.0 narwhals-2.18.0 numpy-2.4.3 openpyxl-3.1.5 packaging-26.0 pandas-3.0.1 pandas-stubs-3.0.0.260204 pathspec-1.0.4 plotly-6.6.0 pluggy-1.6.0 pyarrow-23.0.1 pygments-2.19.2 pyprideap-0.1.0 pytest-9.0.2 python-dateutil-2.9.0.post0 requests-2.32.5 ruff-0.15.5 scikit-learn-1.8.0 scipy-1.17.1 six-1.17.0 threadpoolctl-3.6.0 types-requests-2.32.4.20260107 typing_extensions-4.15.0 urllib3-2.6.3 277: ##[group]Run mypy src/pyprideap/ --ignore-missing-imports 278: �[36;1mmypy src/pyprideap/ --ignore-missing-imports�[0m 279: shell: /usr/bin/bash -e {0} 280: env: 281: pythonLocation: /opt/hostedtoolcache/Python/3.12.12/x64 282: PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.12.12/x64/lib/pkgconfig 283: Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64 284: Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64 285: Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64 286: LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.12.12/x64/lib 287: ##[endgroup] 288: src/pyprideap/lod.py:127: error: Returning Any from function declared to return "Series[Any]" [no-any-return] 289: src/pyprideap/lod.py:197: error: Argument 1 to "tile" has incompatible type "ndarray[tuple[int], dtype[Any]] \| ExtensionArray"; expected "_SupportsArray[dtype[Any]] \| _NestedSequence[_SupportsArray[dtype[Any]]]" [arg-type] 290: src/pyprideap/lod.py:202: error: Returning Any from function declared to return "DataFrame \| Series[Any]" [no-any-return] 291: src/pyprideap/lod.py:412: error: Argument 1 to "tile" has incompatible type "ndarray[tuple[int], dtype[Any]] \| ExtensionArray"; expected "_SupportsArray[dtype[Any]] \| _NestedSequence[_SupportsArray[dtype[Any]]]" [arg-type] 292: src/pyprideap/qc/compute.py:149: error: No overload variant of "list" matches argument type "object" [call-overload] 293: src/pyprideap/qc/compute.py:149: note: Possible overload variants: 294: src/pyprideap/qc/compute.py:149: note: def [_T] __init__(self) -> list[_T] 295: src/pyprideap/qc/compute.py:149: note: def [_T] __init__(self, Iterable[_T], /) -> list[_T] 296: src/pyprideap/qc/report.py:252: error: Cannot call function of unknown type [operator] 297: src/pyprideap/qc/report.py:256: error: "object" has no attribute "title" [attr-defined] 298: Found 7 errors in 3 files (checked 22 source files) 299: ##[error]Process completed with exit code 1. 300: Post job cleanup.

coderabbitai

Actionable comments posted: 10

♻️ Duplicate comments (1)

src/pyprideap/filtering.py (1)

43-45: ⚠️ Potential issue | 🟠 Major

Use positional row filtering for expression and lod_matrix.

Lines 45, 53, 82, and 90 apply a boolean Series built from dataset.samples directly to other DataFrames. Pandas aligns that mask by index label, so any future index drift between these frames will mis-filter rows or raise. Convert the mask once with keep_mask.to_numpy() and use .iloc[...] for the row-aligned matrices.

🐛 Suggested fix

     keep_mask = ~is_control
-    samples = dataset.samples[keep_mask].reset_index(drop=True)
-    expression = dataset.expression[keep_mask].reset_index(drop=True)
+    row_mask = keep_mask.to_numpy()
+    samples = dataset.samples.loc[keep_mask].reset_index(drop=True)
+    expression = dataset.expression.iloc[row_mask].reset_index(drop=True)
@@
         lod_df = metadata["lod_matrix"]
         if isinstance(lod_df, pd.DataFrame):
-            metadata["lod_matrix"] = lod_df[keep_mask].reset_index(drop=True)
+            metadata["lod_matrix"] = lod_df.iloc[row_mask].reset_index(drop=True)
@@
-    samples = dataset.samples[keep_mask].reset_index(drop=True)
-    expression = dataset.expression[keep_mask].reset_index(drop=True)
+    row_mask = keep_mask.to_numpy()
+    samples = dataset.samples.loc[keep_mask].reset_index(drop=True)
+    expression = dataset.expression.iloc[row_mask].reset_index(drop=True)
@@
         lod_df = metadata["lod_matrix"]
         if isinstance(lod_df, pd.DataFrame):
-            metadata["lod_matrix"] = lod_df[keep_mask].reset_index(drop=True)
+            metadata["lod_matrix"] = lod_df.iloc[row_mask].reset_index(drop=True)

Also applies to: 51-53, 81-82, 88-90

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/filtering.py` around lines 43 - 45, The boolean mask built from
dataset.samples (keep_mask = ~is_control) is being applied directly to other
DataFrames which causes label-based alignment bugs; convert the mask to a
positional boolean array (e.g., keep_pos = keep_mask.to_numpy()) and use
.iloc[...] to select rows for dataset.samples, dataset.expression and
dataset.lod_matrix (and any other uses) — replace expressions like samples =
dataset.samples[keep_mask].reset_index(...) and expression =
dataset.expression[keep_mask].reset_index(...) with samples =
dataset.samples.iloc[keep_pos].reset_index(drop=True), expression =
dataset.expression.iloc[keep_pos].reset_index(drop=True), and similarly use
.iloc[keep_pos] when slicing lod_matrix.

🧹 Nitpick comments (1)

src/pyprideap/qc/render.py (1)
45-60: Normalize the histogram traces if this chart is supposed to compare distributions.

The docstring says density/KDE-like output, but Lines 54-60 use histnorm="", which plots raw counts. Samples with more non-missing measurements will therefore look "larger" even when the underlying shape is identical. If the intent is distribution comparison, switch to histnorm="probability density"/"probability" or update the copy to match the current behavior.
💡 Suggested fix
-                histnorm="",
+                histnorm="probability density",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/qc/render.py` around lines 45 - 60, The histogram traces in
render_distribution currently use histnorm="" which shows raw counts and biases
samples with more measurements; update the go.Histogram call in
render_distribution to use an appropriate normalization such as
histnorm="probability density" (or "probability") so the per-sample overlaid
curves compare distributions rather than counts, and ensure the docstring still
states density/KDE-like output (or adjust the docstring if you intentionally
keep raw counts).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pyprideap/__init__.py`:
- Around line 3-13: The import block is too long and triggers Ruff I001/E501;
reflow the pyprideap.lod import into a parenthesized, multi-line import and let
Ruff/isort reorder groups so line lengths and ordering meet style rules—e.g.
replace the single long line that imports LodMethod, LodStats,
compute_lod_from_controls, compute_nclod, compute_lod_stats,
get_bundled_fixed_lod_path, get_reported_lod, get_valid_proteins, load_fixed_lod
with a parenthesized import list (one name per line) and then run ruff/isort to
fix ordering and spacing.

In `@src/pyprideap/lod.py`:
- Around line 486-488: The comprehension creating panel_cols is too long and
exceeds the 120-char limit; split it across multiple lines to satisfy the linter
by breaking after the opening bracket and aligning the conditional onto its own
line. Update the expression that builds panel_cols (the list comprehension
referencing dataset.expression.columns, panel_map, panel_name, and
has_lod_per_col) so each logical part is on a separate line (e.g., start the
list on one line, put the for clause on the next line, and the if condition on
another) to keep line length <=120.
- Line 247: The hardcoded _CONFIGS_DIR path will break after pip installation
and causes get_bundled_fixed_lod_path() to return None; replace filesystem-based
resolution with importlib.resources (or importlib.resources.files/open_text) to
load the bundled CSVs from the package resources and update pyproject.toml to
include those CSVs as package data (or configure hatchling's include section) so
they are distributed; modify get_bundled_fixed_lod_path() (and any callers) to
use importlib.resources APIs to locate/read the CSVs rather than
Path(__file__).resolve().parent... and remove reliance on _CONFIGS_DIR.

In `@src/pyprideap/qc/compute.py`:
- Around line 98-99: The helper _sample_id_col currently returns "SampleId" even
when neither "SampleID" nor "SampleId" exists; change _sample_id_col to return
None (or raise a ValueError with a clear message) when no valid column is found,
and update callers such as compute_correlation to check the return value before
indexing dataset.samples (raise a clear error or handle the missing ID column
case) so a KeyError is avoided and errors are informative.
- Around line 148-149: The error arises because dataset.metadata is typed
dict[str, object] so dataset.metadata["MissingFreq"] is object and list(...)
fails; fix by casting that value to the expected iterable type before
converting/assigning to freq. In compute.py, import typing.cast and change the
assignment in the branch with "MissingFreq" to use cast(List[str] or
Iterable[str], dataset.metadata["MissingFreq"]) (or cast to the concrete type
you expect) so freq has the proper type; reference the variables/members
dataset.metadata, "MissingFreq", and freq when making this change.
- Around line 218-231: The loop over numeric.columns uses the index i to grab
assay IDs and panels from dataset.features which can misalign; instead, map by
column name (col) to the feature row: look up dataset.features rows where
dataset.features[id_col] == col and use that row's id and panel values (fall
back to col or empty string if no match). Update the assignments that populate
assay_ids and panels (references: numeric.columns loop, dataset.features,
id_col, panel_col, assay_ids, panels) to perform a name-based lookup (or an
indexed mapping built once outside the loop) rather than using i-based
positional indexing.

In `@src/pyprideap/qc/report.py`:
- Around line 26-30: The help text for the "missing_frequency" entry contains
escaped percent signs ("30%%") that render as "30%%" in the HTML; update the
string literal under the "missing_frequency" key in src/pyprideap/qc/report.py
to use single percent signs ("30%") instead of "30%%" so the HTML shows "30%".
- Line 335: The HTML file is written with output.write_text(html) using
platform-default encoding while the content declares UTF-8 and contains
non-ASCII characters; change the write to explicitly use UTF-8 by calling
output.write_text(html, encoding="utf-8") and update any tests that read this
file with Path.read_text() to call read_text(encoding="utf-8") so bytes and
declared charset always match (look for the output.write_text(html) call in
report generation and corresponding read_text() uses in tests).
- Around line 220-230: Narrow the type of plot_data entries before calling the
renderer by checking the runtime type: after retrieving data =
plot_data.get(key) and ensuring data is not None, use isinstance(data, _dtype)
(where _dtype and renderer are the two elements from the _RENDERERS tuple you
unpack) or cast to the dataclass so that renderer(data) and access like
data.title are type-safe; also fix the help string literals by replacing the
double-percent occurrences ("30%% threshold", "more than 30%% of") with single
percent signs ("30% threshold", "more than 30% of"); finally, write the HTML
with explicit UTF-8 encoding by changing output.write_text(html) to
output.write_text(html, encoding="utf-8").

In `@src/pyprideap/readers/olink_xlsx.py`:
- Line 10: The XLSX reader currently drops the MissingFreq column because
_FEATURE_COLS in src/pyprideap/readers/olink_xlsx.py lacks "MissingFreq"; update
the _FEATURE_COLS set to include "MissingFreq" (matching the entry in
src/pyprideap/readers/olink_csv.py) so feature metadata extraction in the Olink
XLSX reader (referencing _FEATURE_COLS) preserves that field and yields
consistent metadata across CSV and XLSX loaders.

---

Duplicate comments:
In `@src/pyprideap/filtering.py`:
- Around line 43-45: The boolean mask built from dataset.samples (keep_mask =
~is_control) is being applied directly to other DataFrames which causes
label-based alignment bugs; convert the mask to a positional boolean array
(e.g., keep_pos = keep_mask.to_numpy()) and use .iloc[...] to select rows for
dataset.samples, dataset.expression and dataset.lod_matrix (and any other uses)
— replace expressions like samples = dataset.samples[keep_mask].reset_index(...)
and expression = dataset.expression[keep_mask].reset_index(...) with samples =
dataset.samples.iloc[keep_pos].reset_index(drop=True), expression =
dataset.expression.iloc[keep_pos].reset_index(drop=True), and similarly use
.iloc[keep_pos] when slicing lod_matrix.

---

Nitpick comments:
In `@src/pyprideap/qc/render.py`:
- Around line 45-60: The histogram traces in render_distribution currently use
histnorm="" which shows raw counts and biases samples with more measurements;
update the go.Histogram call in render_distribution to use an appropriate
normalization such as histnorm="probability density" (or "probability") so the
per-sample overlaid curves compare distributions rather than counts, and ensure
the docstring still states density/KDE-like output (or adjust the docstring if
you intentionally keep raw counts).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 790cb379-de63-4432-b5f5-73c0b0ac749e

📥 Commits

Reviewing files that changed from the base of the PR and between e39bda9 and 5cb8d56.

⛔ Files ignored due to path filters (3)

configs/Explore-3072-Fixed-LOD-2024-12-19.csv is excluded by !**/*.csv
configs/Explore-HT-Fixed-LOD.csv is excluded by !**/*.csv
configs/Reveal-Fixed-LOD.csv is excluded by !**/*.csv

📒 Files selected for processing (15)

.gitignore
pyproject.toml
src/pyprideap/__init__.py
src/pyprideap/filtering.py
src/pyprideap/lod.py
src/pyprideap/qc/__init__.py
src/pyprideap/qc/compute.py
src/pyprideap/qc/render.py
src/pyprideap/qc/report.py
src/pyprideap/readers/olink_csv.py
src/pyprideap/readers/olink_xlsx.py
tests/test_lod.py
tests/test_qc_compute.py
tests/test_qc_render.py
tests/test_qc_report.py

🚧 Files skipped from review as they are similar to previous changes (3)

pyproject.toml
tests/test_lod.py
.gitignore

coderabbitai · 2026-03-12T08:17:55Z

src/pyprideap/__init__.py

+from importlib.metadata import PackageNotFoundError, version
+
+from pyprideap.core import AffinityDataset, Level, Platform, ValidationResult
+from pyprideap.filtering import filter_controls, filter_qc
+from pyprideap.lod import LodMethod, LodStats, compute_lod_from_controls, compute_nclod, compute_lod_stats, get_bundled_fixed_lod_path, get_reported_lod, get_valid_proteins, load_fixed_lod
+from pyprideap.pride import PrideClient
+from pyprideap.qc.compute import compute_all as compute_qc
+from pyprideap.qc.report import qc_report
+from pyprideap.readers.registry import read
+from pyprideap.stats import DatasetStats, compute_stats
+from pyprideap.validators import validate


⚠️ Potential issue | 🟡 Minor

Reflow the import block so Ruff passes.

CI is already failing on I001/E501 for this block. Breaking the long pyprideap.lod import into a parenthesized block and letting Ruff/isort reorder the group should clear it.

💡 Suggested fix

-from pyprideap.lod import LodMethod, LodStats, compute_lod_from_controls, compute_nclod, compute_lod_stats, get_bundled_fixed_lod_path, get_reported_lod, get_valid_proteins, load_fixed_lod +from pyprideap.lod import ( + LodMethod, + LodStats, + compute_lod_from_controls, + compute_lod_stats, + compute_nclod, + get_bundled_fixed_lod_path, + get_reported_lod, + get_valid_proteins, + load_fixed_lod, +)

🧰 Tools

🪛 GitHub Actions: CI

[error] 3-13: I001 Import block is un-sorted or un-formatted.

[error] 7-7: E501 Line too long (188 > 120)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/__init__.py` around lines 3 - 13, The import block is too long and triggers Ruff I001/E501; reflow the pyprideap.lod import into a parenthesized, multi-line import and let Ruff/isort reorder groups so line lengths and ordering meet style rules—e.g. replace the single long line that imports LodMethod, LodStats, compute_lod_from_controls, compute_nclod, compute_lod_stats, get_bundled_fixed_lod_path, get_reported_lod, get_valid_proteins, load_fixed_lod with a parenthesized import list (one name per line) and then run ruff/isort to fix ordering and spacing.

coderabbitai · 2026-03-12T08:17:55Z

src/pyprideap/lod.py

+    return pd.Series(lod_series.values, index=dataset.expression.columns[: len(lod_series)])
+
+
+_CONFIGS_DIR = Path(__file__).resolve().parent.parent.parent / "configs"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if configs directory is included in package data or exists at expected location fd -t d "configs" --max-depth 4 # Check pyproject.toml for package data configuration cat pyproject.toml | grep -A 10 "package-data\|include\|data-files" || echo "No package data config found"

Repository: PRIDE-Archive/pyprideap

Length of output: 104

🏁 Script executed:

# Check for setup.py and MANIFEST.in fd -t f "setup.py|setup.cfg|MANIFEST.in" --max-depth 2 # Get the full pyproject.toml to understand distribution setup cat pyproject.toml

Repository: PRIDE-Archive/pyprideap

Length of output: 1669

🏁 Script executed:

# Find usage of _CONFIGS_DIR rg "_CONFIGS_DIR" -n # Check what files are in configs directory fd -t f . configs/ # See how get_bundled_fixed_lod_path handles missing files sed -n '250,270p' src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 1023

🏁 Script executed:

# Check for tests related to bundled LOD rg "get_bundled_fixed_lod_path|_BUNDLED_FIXED_LOD" tests/ --type py # Check README for any mention of configs or bundled data grep -i "config\|bundle\|fixed.lod" README.md

Repository: PRIDE-Archive/pyprideap

Length of output: 49

Path resolution will fail after pip installation; bundled LOD data will not be accessible.

The _CONFIGS_DIR path uses Path(__file__).resolve().parent.parent.parent / "configs", which works in development but breaks in distributed installations. The configs/ directory is not included in the package distribution (pyproject.toml has no package-data configuration and hatchling only includes Python packages by default). After pip install, this path resolution will point to the wrong location, causing get_bundled_fixed_lod_path() to silently return None for all platforms.

Use importlib.resources to access the bundled CSV files instead, ensuring they are included in the package distribution via pyproject.toml configuration.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/lod.py` at line 247, The hardcoded _CONFIGS_DIR path will break after pip installation and causes get_bundled_fixed_lod_path() to return None; replace filesystem-based resolution with importlib.resources (or importlib.resources.files/open_text) to load the bundled CSVs from the package resources and update pyproject.toml to include those CSVs as package data (or configure hatchling's include section) so they are distributed; modify get_bundled_fixed_lod_path() (and any callers) to use importlib.resources APIs to locate/read the CSVs rather than Path(__file__).resolve().parent... and remove reliance on _CONFIGS_DIR.

coderabbitai · 2026-03-12T08:17:55Z

src/pyprideap/lod.py

+            panel_cols = [
+                c for c in dataset.expression.columns if panel_map.get(c) == panel_name and has_lod_per_col.get(c, False)
+            ]


⚠️ Potential issue | 🟡 Minor

Fix line length to resolve pipeline failure.

Line 487 exceeds 120 characters (121 > 120), causing the linter to fail.

🔧 Proposed fix

panel_cols = [ - c for c in dataset.expression.columns if panel_map.get(c) == panel_name and has_lod_per_col.get(c, False) + c for c in dataset.expression.columns + if panel_map.get(c) == panel_name and has_lod_per_col.get(c, False) ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

panel_cols = [

c for c in dataset.expression.columns if panel_map.get(c) == panel_name and has_lod_per_col.get(c, False)

]

panel_cols = [

c for c in dataset.expression.columns

if panel_map.get(c) == panel_name and has_lod_per_col.get(c, False)

]

🧰 Tools

🪛 GitHub Actions: CI

[error] 487-487: E501 Line too long (121 > 120)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/lod.py` around lines 486 - 488, The comprehension creating panel_cols is too long and exceeds the 120-char limit; split it across multiple lines to satisfy the linter by breaking after the opening bracket and aligning the conditional onto its own line. Update the expression that builds panel_cols (the list comprehension referencing dataset.expression.columns, panel_map, panel_name, and has_lod_per_col) so each logical part is on a separate line (e.g., start the list on one line, put the for clause on the next line, and the if condition on another) to keep line length <=120.

coderabbitai · 2026-03-12T08:17:55Z

src/pyprideap/qc/compute.py

+def _sample_id_col(dataset: AffinityDataset) -> str:
+    return "SampleID" if "SampleID" in dataset.samples.columns else "SampleId"


⚠️ Potential issue | 🟡 Minor

_sample_id_col returns "SampleId" even when neither column exists.

When dataset.samples contains neither "SampleID" nor "SampleId" (common for SomaScan ADAT files per the reader implementation), this function returns "SampleId" unconditionally. Callers like compute_correlation (line 288) use this to index into dataset.samples, which would raise a KeyError.

Consider returning None or raising a clearer error when no valid column is found.

🛡️ Proposed fix

def _sample_id_col(dataset: AffinityDataset) -> str | None: - return "SampleID" if "SampleID" in dataset.samples.columns else "SampleId" + if "SampleID" in dataset.samples.columns: + return "SampleID" + if "SampleId" in dataset.samples.columns: + return "SampleId" + return None

Then update callers to check for None before indexing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _sample_id_col(dataset: AffinityDataset) -> str:

return "SampleID" if "SampleID" in dataset.samples.columns else "SampleId"

def _sample_id_col(dataset: AffinityDataset) -> str | None:

if "SampleID" in dataset.samples.columns:

return "SampleID"

if "SampleId" in dataset.samples.columns:

return "SampleId"

return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/compute.py` around lines 98 - 99, The helper _sample_id_col currently returns "SampleId" even when neither "SampleID" nor "SampleId" exists; change _sample_id_col to return None (or raise a ValueError with a clear message) when no valid column is found, and update callers such as compute_correlation to check the return value before indexing dataset.samples (raise a clear error or handle the missing ID column case) so a KeyError is avoided and errors are informative.

coderabbitai · 2026-03-12T08:17:55Z

src/pyprideap/qc/compute.py

+    elif "MissingFreq" in dataset.metadata:
+        freq = list(dataset.metadata["MissingFreq"])


⚠️ Potential issue | 🟡 Minor

Fix type error causing pipeline failure.

The pipeline fails with: No overload variant of "list" matches argument type "object". Since dataset.metadata is typed as dict[str, object], the value returned by dataset.metadata["MissingFreq"] has type object, which list() cannot accept without a cast.

🔧 Proposed fix

elif "MissingFreq" in dataset.metadata: - freq = list(dataset.metadata["MissingFreq"]) + mf = dataset.metadata["MissingFreq"] + freq = list(mf) if hasattr(mf, "__iter__") else [] # type: ignore[arg-type]

Or use an explicit cast:

+from typing import cast, Iterable ... elif "MissingFreq" in dataset.metadata: - freq = list(dataset.metadata["MissingFreq"]) + freq = list(cast(Iterable[float], dataset.metadata["MissingFreq"]))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

elif "MissingFreq" in dataset.metadata:

freq = list(dataset.metadata["MissingFreq"])

elif "MissingFreq" in dataset.metadata:

mf = dataset.metadata["MissingFreq"]

freq = list(mf) if hasattr(mf, "__iter__") else [] # type: ignore[arg-type]

🧰 Tools

🪛 GitHub Actions: CI

[error] 149-149: No overload variant of "list" matches argument type "object" [call-overload]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/compute.py` around lines 148 - 149, The error arises because dataset.metadata is typed dict[str, object] so dataset.metadata["MissingFreq"] is object and list(...) fails; fix by casting that value to the expected iterable type before converting/assigning to freq. In compute.py, import typing.cast and change the assignment in the branch with "MissingFreq" to use cast(List[str] or Iterable[str], dataset.metadata["MissingFreq"]) (or cast to the concrete type you expect) so freq has the proper type; reference the variables/members dataset.metadata, "MissingFreq", and freq when making this change.

src/pyprideap/qc/compute.py

coderabbitai · 2026-03-12T08:17:56Z

src/pyprideap/qc/report.py

+    "missing_frequency": (
+        "Displays how many assays fall into each missing-frequency bin. The x-axis is the fraction of samples "
+        "where a given assay has no measured value; the y-axis (log scale) is the count of assays. The red "
+        "dashed line marks the 30%% threshold — assays above this line are missing in more than 30%% of "
+        "samples and may need to be excluded from downstream analysis."


⚠️ Potential issue | 🟡 Minor

Fix the doubled percent signs in the help copy.

These strings are emitted verbatim into the HTML, so the report currently shows 30%% instead of 30%.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/report.py` around lines 26 - 30, The help text for the "missing_frequency" entry contains escaped percent signs ("30%%") that render as "30%%" in the HTML; update the string literal under the "missing_frequency" key in src/pyprideap/qc/report.py to use single percent signs ("30%") instead of "30%%" so the HTML shows "30%".

coderabbitai · 2026-03-12T08:17:56Z

src/pyprideap/qc/report.py

+    _RENDERERS = {
+        "distribution": (DistributionData, R.render_distribution),
+        "missing_frequency": (MissingFrequencyData, R.render_missing_frequency),
+        "qc_summary": (QcLodSummaryData, R.render_qc_summary),
+        "lod_analysis": (LodAnalysisData, R.render_lod_analysis),
+        "pca": (PcaData, R.render_pca),
+        "correlation": (CorrelationData, R.render_correlation),
+        "missing_values": (MissingValuesData, R.render_missing_values),
+        "cv_distribution": (CvDistributionData, R.render_cv_distribution),
+        "detection_rate": (DetectionRateData, R.render_detection_rate),
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | sed -n '220,270p'

Repository: PRIDE-Archive/pyprideap

Length of output: 2467

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | sed -n '1,250p' | tail -n 50

Repository: PRIDE-Archive/pyprideap

Length of output: 2300

🏁 Script executed:

cat -n src/pyprideap/qc/compute.py | sed -n '340,365p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1246

🏁 Script executed:

rg "30%%" src/pyprideap/qc/report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 172

🏁 Script executed:

rg "write_text" src/pyprideap/qc/report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 94

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | grep -A 5 -B 5 "write_text"

Repository: PRIDE-Archive/pyprideap

Length of output: 315

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | grep -B 10 "30%%" | head -n 20

Repository: PRIDE-Archive/pyprideap

Length of output: 931

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | sed -n '19,35p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1394

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | sed -n '300,336p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1902

🏁 Script executed:

python3 << 'EOF' # Test what %% in a regular string literal produces test_str = "This is 30%% of samples" print("String value:", repr(test_str)) print("Rendered:", test_str) # Compare with what should happen in format context format_str = "This is 30%% of {}" print("Format string rendered:", format_str.format("samples")) EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 188

🏁 Script executed:

rg "encoding" src/pyprideap/qc/report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

python3 << 'EOF' # Simulate what mypy sees from typing import Dict, Any # This is what compute_all returns plot_data: Dict[str, object] = {"key": "value"} # At line 249 data = plot_data.get("key") # type: object | None # After the None check at line 250-251 if data is not None: # data is still typed as object, not narrowed to the specific dataclass type # because dict.get() with dict[str, object] returns object | None # Accessing data.title would fail mypy type check print(type(data)) # Would need isinstance check to narrow EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 80

Narrow plot_data before invoking the renderer registry, fix the %% escaping in help text, and specify UTF-8 encoding for HTML output.

compute_all() returns dict[str, object] (compute.py:347), so at line 249 data = plot_data.get(key) is typed as object | None. The if data is None: continue check at lines 250–251 narrows only to object, not to the specific dataclass type. Line 252 calls renderer(data) and line 256 accesses data.title—both fail mypy type checking because object has no title attribute and renderer expects a concrete dataclass type. Add an isinstance(data, _dtype) guard or cast after the None check to properly narrow the type.

The help text at lines 29–30 contains literal "30%% threshold" and "more than 30%% of". In a regular string literal (not f-string or format context), %% is not escaped and will appear as 30%% in the rendered HTML instead of 30%.

Line 335 calls output.write_text(html) without specifying an encoding. The HTML contains non-ASCII characters (em dash \u2014 at line 319), and the default platform encoding may vary. Use output.write_text(html, encoding="utf-8") for consistency and portability.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/report.py` around lines 220 - 230, Narrow the type of plot_data entries before calling the renderer by checking the runtime type: after retrieving data = plot_data.get(key) and ensuring data is not None, use isinstance(data, _dtype) (where _dtype and renderer are the two elements from the _RENDERERS tuple you unpack) or cast to the dataclass so that renderer(data) and access like data.title are type-safe; also fix the help string literals by replacing the double-percent occurrences ("30%% threshold", "more than 30%% of") with single percent signs ("30% threshold", "more than 30% of"); finally, write the HTML with explicit UTF-8 encoding by changing output.write_text(html) to output.write_text(html, encoding="utf-8").

coderabbitai · 2026-03-12T08:17:56Z

src/pyprideap/qc/report.py

+        "</body>\n</html>"
+    )
+
+    output.write_text(html)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l src/pyprideap/qc/report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 97

🏁 Script executed:

cat -n src/pyprideap/qc/report.py | sed -n '330,340p'

Repository: PRIDE-Archive/pyprideap

Length of output: 315

🏁 Script executed:

# Search for where the HTML is generated and what special characters it contains rg "—|≤|−" src/pyprideap/qc/report.py -B 3 -A 3

Repository: PRIDE-Archive/pyprideap

Length of output: 1438

🏁 Script executed:

# Check for test files related to this module find . -name "*test*report*" -o -name "*report*test*" | head -20

Repository: PRIDE-Archive/pyprideap

Length of output: 92

🏁 Script executed:

# Check the test file to see how it handles encoding cat -n tests/test_qc_report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 3279

🏁 Script executed:

# Check Python's write_text() behavior - look at what encoding is used by default python3 <<'EOF' from pathlib import Path import inspect # Check write_text signature sig = inspect.signature(Path.write_text) print(f"write_text signature: {sig}") # Check default encoding print(f"\nDefault encoding for write_text: {inspect.signature(Path.write_text).parameters}") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 374

🏁 Script executed:

# Search for any other places where output might be read back rg "read_text|open.*encoding" src/pyprideap/qc/report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Check the context of where output is defined - where the Path is created cat -n src/pyprideap/qc/report.py | sed -n '300,340p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1902

🏁 Script executed:

# Check the qc_report function signature to understand output parameter type cat -n src/pyprideap/qc/report.py | sed -n '260,280p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1052

🏁 Script executed:

# Check if there are any other write_text calls in the file that might already use encoding rg "write_text|write_bytes" src/pyprideap/qc/report.py

Repository: PRIDE-Archive/pyprideap

Length of output: 94

🏁 Script executed:

# Get the full function signature cat -n src/pyprideap/qc/report.py | sed -n '200,220p'

Repository: PRIDE-Archive/pyprideap

Length of output: 829

🏁 Script executed:

# Check if the output is a Path object rg "def qc_report" src/pyprideap/qc/report.py -A 5

Repository: PRIDE-Archive/pyprideap

Length of output: 385

Specify UTF-8 encoding when writing HTML.

The generated HTML declares <meta charset="utf-8"> but contains non-ASCII characters (—, ≤, −) and is written with write_text() using the platform-default encoding. On non-UTF-8 locales, this causes a mismatch: the bytes won't be UTF-8 even though the charset is declared as UTF-8. Use output.write_text(html, encoding="utf-8") to ensure the file is always written in UTF-8. Additionally, the tests that read this file back should also specify encoding="utf-8" in their read_text() calls to maintain consistency.

💡 Suggested fix

- output.write_text(html) + output.write_text(html, encoding="utf-8")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

output.write_text(html)

output.write_text(html, encoding="utf-8")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/report.py` at line 335, The HTML file is written with output.write_text(html) using platform-default encoding while the content declares UTF-8 and contains non-ASCII characters; change the write to explicitly use UTF-8 by calling output.write_text(html, encoding="utf-8") and update any tests that read this file with Path.read_text() to call read_text(encoding="utf-8") so bytes and declared charset always match (look for the output.write_text(html) call in report generation and corresponding read_text() uses in tests).

coderabbitai · 2026-03-12T08:17:56Z

src/pyprideap/readers/olink_xlsx.py

+from pyprideap.core import AffinityDataset, Platform
+
+_SAMPLE_COLS = {"SampleID", "PlateID", "WellID", "SampleType", "SampleQC", "PlateQC"}
+_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD"}


⚠️ Potential issue | 🟠 Major

Preserve MissingFreq in XLSX feature metadata.

src/pyprideap/readers/olink_csv.py:9-11 includes MissingFreq in _FEATURE_COLS, but this reader drops it entirely. The same Olink dataset will therefore expose different feature metadata depending on whether it is loaded from CSV or XLSX, which makes downstream QC/statistics format-dependent.

Proposed fix

-_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD"} +_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD", "MissingFreq"}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD"}

_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD", "MissingFreq"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/readers/olink_xlsx.py` at line 10, The XLSX reader currently drops the MissingFreq column because _FEATURE_COLS in src/pyprideap/readers/olink_xlsx.py lacks "MissingFreq"; update the _FEATURE_COLS set to include "MissingFreq" (matching the entry in src/pyprideap/readers/olink_csv.py) so feature metadata extraction in the Olink XLSX reader (referencing _FEATURE_COLS) preserves that field and yields consistent metadata across CSV and XLSX loaders.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (4)

src/pyprideap/readers/olink_xlsx.py (1)

10-10: ⚠️ Potential issue | 🟠 Major

Add MissingFreq to the XLSX feature schema.

src/pyprideap/readers/olink_csv.py preserves MissingFreq, but this reader drops it entirely. The same Olink export will therefore expose different features metadata depending on whether it is loaded from CSV or XLSX.

Suggested fix

-_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD"}
+_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD", "MissingFreq"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/readers/olink_xlsx.py` at line 10, The XLSX reader's feature
schema currently omits the MissingFreq column causing inconsistent features
metadata vs the CSV reader; update the _FEATURE_COLS set in
src/pyprideap/readers/olink_xlsx.py to include "MissingFreq" (same symbol
_FEATURE_COLS) so the XLSX parser preserves that column, and ensure any
downstream usage in the reader logic that iterates over _FEATURE_COLS (e.g.,
feature extraction or validation code in the olink_xlsx reader) continues to
handle MissingFreq the same way as the CSV reader.

src/pyprideap/lod.py (3)

96-103: ⚠️ Potential issue | 🟠 Major

Require negative controls for NCLOD.

This still falls back to positive/QC/reference controls when no negatives exist, which changes the NCLOD threshold instead of failing fast.

Suggested fix

     is_negative = st.isin(_NEGATIVE_CONTROL_TYPES)
     if is_negative.any():
         return is_negative
-
-    # Fallback: use any control-like sample
-    is_control = st.isin(_CONTROL_SAMPLE_TYPES)
-    if is_control.any():
-        return is_control
 
     raise ValueError("No negative control samples found in SampleType column")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/lod.py` around lines 96 - 103, The current logic uses
_CONTROL_SAMPLE_TYPES fallback when no negatives are present which alters NCLOD;
change the behavior so NCLOD requires negative controls only: in the function
that computes is_negative (look for variables is_negative and
_NEGATIVE_CONTROL_TYPES in src/pyprideap/lod.py) remove or disable the fallback
branch that checks _CONTROL_SAMPLE_TYPES/is_control and instead raise a clear
error (or return a failing status) when is_negative.any() is false so the caller
fails fast if no negative controls are provided.

488-490: ⚠️ Potential issue | 🟡 Minor

Wrap panel_cols to unblock the linter.

CI is still failing on Line 489 with E501.

Suggested fix

             panel_cols = [
-                c for c in dataset.expression.columns if panel_map.get(c) == panel_name and has_lod_per_col.get(c, False)
+                c
+                for c in dataset.expression.columns
+                if panel_map.get(c) == panel_name
+                and has_lod_per_col.get(c, False)
             ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/lod.py` around lines 488 - 490, Line is too long (E501): break
the comprehension for panel_cols into multiple lines to satisfy the linter. Edit
the expression that builds panel_cols (the list comprehension using
dataset.expression.columns, panel_map, panel_name, and has_lod_per_col) so that
the opening bracket, the iterator expression, and the conditional are split
across lines (or assign intermediate variables) to keep line length under limit
and preserve the same logic.

249-269: ⚠️ Potential issue | 🟠 Major

Bundled FixedLOD resolution will break outside a source checkout.

Path(__file__).resolve().parent.parent.parent / "configs" depends on the repo layout. After installation, get_bundled_fixed_lod_path() can no longer reliably find the bundled CSVs.

#!/bin/bash
fd -t d '^configs$'
printf '\n--- pyproject packaging hints ---\n'
rg -n "package-data|include|artifacts|only-include" pyproject.toml
printf '\n--- current bundled LOD path logic ---\n'
sed -n '249,317p' src/pyprideap/lod.py

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/pyprideap/lod.py` around lines 249 - 269, The current _CONFIGS_DIR
constant and get_bundled_fixed_lod_path() rely on a repo-relative path that
breaks after installation; instead package the CSVs as package data and resolve
them at runtime via package resources. Update packaging (pyproject.toml) to
include the configs CSVs, remove/stop relying on _CONFIGS_DIR, and change
get_bundled_fixed_lod_path(platform) to use importlib.resources (or
importlib_resources for py<3.9) to locate the file inside the installed
pyprideap package (e.g.
importlib.resources.files("pyprideap").joinpath("configs", filename)) and return
None if the resource is missing; keep the _BUNDLED_FIXED_LOD mapping and
filename lookup logic the same, only replace the filesystem resolution with
package-resource resolution.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pyprideap/lod.py`:
- Around line 125-127: In _nclod_base(), avoid using np.maximum on a pandas
Series (numeric_expr) because np.maximum returns Any; instead compute medians =
numeric_expr.median() and stds = numeric_expr.std() as before but replace the
np.maximum call with stds.multiply(3).clip(lower=_MIN_STD_FLOOR) (or equivalent
Series.clip usage) and return medians + that clipped Series so the function
retains a pd.Series return type and satisfies mypy; update references to
medians, stds, and _MIN_STD_FLOOR accordingly.
- Around line 198-204: The DataFrame construction and addition degrade type
precision because base.values and adj.values are typed as ndarray |
ExtensionArray; update the lod_matrix creation to use base.to_numpy() when
tiling and convert adj to a pandas.Series (or ensure adj is a Series) before
adding so mypy can infer a concrete DataFrame return type: use
np.tile(base.to_numpy(), (len(dataset.samples), 1)) for the data and ensure the
addition uses a Series (e.g., pd.Series(adj) or adj if already a Series) with
axis=0 when calling lod_matrix.add; keep the same column=index and
index=dataset.expression.index and return lod_matrix.

In `@src/pyprideap/qc/render.py`:
- Around line 110-123: Remove the dead variable cumulative in the loop inside
the render function (where cumulative = 0.0 is set and cumulative += pct is
applied); delete both the initialization and the increment since cumulative is
never read, leaving the loop that builds the go.Bar traces using pct,
data.categories, data.counts, and colors intact.
- Around line 31-42: The _QC_LOD_COLORS mapping assigns alarming red ("#e74c3c")
to "WARN & NPX ≤ LOD" while various "FAIL" keys use gray, which flips visual
severity; update the _QC_LOD_COLORS dict so FAIL-related keys (e.g., "FAIL",
"FAIL & NPX > LOD", "FAIL & NPX ≤ LOD") use a more alarming color (e.g., red)
and WARN-related keys (e.g., "WARN", "WARN & NPX > LOD", "WARN & NPX ≤ LOD") use
a less severe color (e.g., orange), or document the intended semantics if the
current mapping is deliberate; locate and change the color hex values in the
_QC_LOD_COLORS constant in render.py accordingly.

In `@src/pyprideap/readers/olink_csv.py`:
- Around line 47-56: The LOD pivot can drop assay columns with all-NA values
causing metadata["lod_matrix"] to not match expression.columns; after creating
lod_matrix with df.pivot_table (and before resetting index), reindex its columns
to the same assay order used for expression/features (the same reference used at
line where features are aligned) and fill missing columns with NaN so
lod_matrix.columns == expression.columns; apply the identical fix in
olink_xlsx.py where lod_matrix is built, ensuring you reference the same
sample_order and assay ordering variable (expression.columns or features) when
reindexing.

---

Duplicate comments:
In `@src/pyprideap/lod.py`:
- Around line 96-103: The current logic uses _CONTROL_SAMPLE_TYPES fallback when
no negatives are present which alters NCLOD; change the behavior so NCLOD
requires negative controls only: in the function that computes is_negative (look
for variables is_negative and _NEGATIVE_CONTROL_TYPES in src/pyprideap/lod.py)
remove or disable the fallback branch that checks
_CONTROL_SAMPLE_TYPES/is_control and instead raise a clear error (or return a
failing status) when is_negative.any() is false so the caller fails fast if no
negative controls are provided.
- Around line 488-490: Line is too long (E501): break the comprehension for
panel_cols into multiple lines to satisfy the linter. Edit the expression that
builds panel_cols (the list comprehension using dataset.expression.columns,
panel_map, panel_name, and has_lod_per_col) so that the opening bracket, the
iterator expression, and the conditional are split across lines (or assign
intermediate variables) to keep line length under limit and preserve the same
logic.
- Around line 249-269: The current _CONFIGS_DIR constant and
get_bundled_fixed_lod_path() rely on a repo-relative path that breaks after
installation; instead package the CSVs as package data and resolve them at
runtime via package resources. Update packaging (pyproject.toml) to include the
configs CSVs, remove/stop relying on _CONFIGS_DIR, and change
get_bundled_fixed_lod_path(platform) to use importlib.resources (or
importlib_resources for py<3.9) to locate the file inside the installed
pyprideap package (e.g.
importlib.resources.files("pyprideap").joinpath("configs", filename)) and return
None if the resource is missing; keep the _BUNDLED_FIXED_LOD mapping and
filename lookup logic the same, only replace the filesystem resolution with
package-resource resolution.

In `@src/pyprideap/readers/olink_xlsx.py`:
- Line 10: The XLSX reader's feature schema currently omits the MissingFreq
column causing inconsistent features metadata vs the CSV reader; update the
_FEATURE_COLS set in src/pyprideap/readers/olink_xlsx.py to include
"MissingFreq" (same symbol _FEATURE_COLS) so the XLSX parser preserves that
column, and ensure any downstream usage in the reader logic that iterates over
_FEATURE_COLS (e.g., feature extraction or validation code in the olink_xlsx
reader) continues to handle MissingFreq the same way as the CSV reader.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb03d159-5d1d-4635-bd95-7a0843b5735b

📥 Commits

Reviewing files that changed from the base of the PR and between 5cb8d56 and 7198f8d.

📒 Files selected for processing (7)

src/pyprideap/lod.py
src/pyprideap/qc/compute.py
src/pyprideap/qc/render.py
src/pyprideap/qc/report.py
src/pyprideap/readers/olink_csv.py
src/pyprideap/readers/olink_parquet.py
src/pyprideap/readers/olink_xlsx.py

🚧 Files skipped from review as they are similar to previous changes (3)

src/pyprideap/readers/olink_parquet.py
src/pyprideap/qc/report.py
src/pyprideap/qc/compute.py

coderabbitai · 2026-03-12T11:19:03Z

src/pyprideap/lod.py

+    medians = numeric_expr.median()
+    stds = numeric_expr.std()
+    return medians + np.maximum(_MIN_STD_FLOOR, 3 * stds)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n src/pyprideap/lod.py | head -150

Repository: PRIDE-Archive/pyprideap

Length of output: 6256

🏁 Script executed:

# Check for mypy configuration and type checking setup fd -t f "pyproject.toml|setup.cfg|mypy.ini|py.typed" .

Repository: PRIDE-Archive/pyprideap

Length of output: 83

🏁 Script executed:

cat pyproject.toml

Repository: PRIDE-Archive/pyprideap

Length of output: 1669

🏁 Script executed:

# Check for GitHub Actions or CI configuration fd -t f "\.github.*|\.travis|\.gitlab|circle|jenkins" .

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

ls -la .github/ 2>/dev/null || echo "No .github directory"

Repository: PRIDE-Archive/pyprideap

Length of output: 233

🏁 Script executed:

ls -la .github/workflows/

Repository: PRIDE-Archive/pyprideap

Length of output: 233

🏁 Script executed:

cat .github/workflows/ci.yml

Repository: PRIDE-Archive/pyprideap

Length of output: 1143

🏁 Script executed:

# Let's check if there are any type checking hints or test cases for this rg "np.maximum|\.clip" src/

Repository: PRIDE-Archive/pyprideap

Length of output: 145

🏁 Script executed:

# Let's create a simple test to understand type behavior python3 << 'EOF' import numpy as np import pandas as pd from typing import get_type_hints # Simulate the scenario numeric_expr = pd.DataFrame({ 'col1': [1.0, 2.0, 3.0], 'col2': [4.0, 5.0, 6.0] }) medians = numeric_expr.median() stds = numeric_expr.std() print("medians type:", type(medians)) print("stds type:", type(stds)) print("3 * stds type:", type(3 * stds)) # Test np.maximum result1 = np.maximum(0.2, 3 * stds) print("np.maximum result type:", type(result1)) print("np.maximum result:", result1) # Test clip result2 = (3 * stds).clip(lower=0.2) print("clip result type:", type(result2)) print("clip result:", result2) # Test final return final1 = medians + np.maximum(0.2, 3 * stds) print("medians + np.maximum type:", type(final1)) final2 = medians + (3 * stds).clip(lower=0.2) print("medians + clip type:", type(final2)) # Check if they're equal print("Are results equal?", np.allclose(final1, final2)) EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 184

🏁 Script executed:

# Let me check the pandas type stubs for Series operations to confirm # Look for any type annotations in the codebase rg "@.*return|->.*Series" src/pyprideap/lod.py -A 2

Repository: PRIDE-Archive/pyprideap

Length of output: 1022

🏁 Script executed:

# Let's check if there are any tests or type annotations related to this function rg "_nclod_base|nclod" tests/ --type py

Repository: PRIDE-Archive/pyprideap

Length of output: 178

🏁 Script executed:

cat tests/test_lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 12453

Use Series.clip() to preserve type safety in _nclod_base()

The current code returns medians + np.maximum(_MIN_STD_FLOOR, 3 * stds), which widens the return type to Any because np.maximum() doesn't have proper type stubs for pandas Series. This triggers a mypy failure with warn_return_any = true. The Series.clip() method preserves the declared pd.Series return type and is mathematically equivalent:

Suggested fix

- return medians + np.maximum(_MIN_STD_FLOOR, 3 * stds) + return medians + (3 * stds).clip(lower=_MIN_STD_FLOOR)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

medians = numeric_expr.median()

stds = numeric_expr.std()

return medians + np.maximum(_MIN_STD_FLOOR, 3 * stds)

medians = numeric_expr.median()

stds = numeric_expr.std()

return medians + (3 * stds).clip(lower=_MIN_STD_FLOOR)

🧰 Tools

🪛 GitHub Actions: CI

[error] 127-127: mypy: Returning Any from function declared to return "Series[Any]". (no-any-return)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/lod.py` around lines 125 - 127, In _nclod_base(), avoid using np.maximum on a pandas Series (numeric_expr) because np.maximum returns Any; instead compute medians = numeric_expr.median() and stds = numeric_expr.std() as before but replace the np.maximum call with stds.multiply(3).clip(lower=_MIN_STD_FLOOR) (or equivalent Series.clip usage) and return medians + that clipped Series so the function retains a pd.Series return type and satisfies mypy; update references to medians, stds, and _MIN_STD_FLOOR accordingly.

coderabbitai · 2026-03-12T11:19:03Z

src/pyprideap/lod.py

+    lod_matrix = pd.DataFrame(
+        np.tile(base.values, (len(dataset.samples), 1)),
+        columns=base.index,
+        index=dataset.expression.index,
+    )
+    lod_matrix = lod_matrix.add(adj.values, axis=0)
+    return lod_matrix


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the repository structure git ls-files | head -20

Repository: PRIDE-Archive/pyprideap

Length of output: 581

🏁 Script executed:

# Find the file in question fd "lod.py" --type f

Repository: PRIDE-Archive/pyprideap

Length of output: 105

🏁 Script executed:

# Read the relevant section of the file cat -n src/pyprideap/lod.py | sed -n '190,210p'

Repository: PRIDE-Archive/pyprideap

Length of output: 856

🏁 Script executed:

# Get the full function context to understand the types involved cat -n src/pyprideap/lod.py | sed -n '180,210p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1273

🏁 Script executed:

# Check the function signature to understand parameter types cat -n src/pyprideap/lod.py | grep -B 20 "lod_matrix = pd.DataFrame" | head -25

Repository: PRIDE-Archive/pyprideap

Length of output: 924

🏁 Script executed:

# Get the full function definition that contains line 198-204 cat -n src/pyprideap/lod.py | sed -n '170,180p'

Repository: PRIDE-Archive/pyprideap

Length of output: 580

🏁 Script executed:

# Search for the function signature that contains this code rg -B 30 "np.tile\(base.values" src/pyprideap/lod.py | head -40

Repository: PRIDE-Archive/pyprideap

Length of output: 1092

🏁 Script executed:

# Find _nclod_base function rg -A 10 "def _nclod_base" src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 497

🏁 Script executed:

# Find _intensity_adjustment function rg -A 10 "def _intensity_adjustment" src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 554

🏁 Script executed:

# Check the imports and pandas type hints rg "import pandas|from pandas" src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 86

🏁 Script executed:

# Check for mypy configuration fd -e toml -e ini | xargs grep -l "mypy" 2>/dev/null

Repository: PRIDE-Archive/pyprideap

Length of output: 81

🏁 Script executed:

# Check pyproject.toml for type checking config cat pyproject.toml | grep -A 10 mypy

Repository: PRIDE-Archive/pyprideap

Length of output: 344

🏁 Script executed:

# Get the complete definition of _nclod_base to see the exact return type cat -n src/pyprideap/lod.py | sed -n '110,150p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1825

🏁 Script executed:

# Check what numpy.tile expects for its first argument and what it returns python3 << 'EOF' import inspect import numpy as np # Check np.tile signature sig = inspect.signature(np.tile) print("np.tile signature:", sig) # Check return annotation print("\nnp.tile return annotation:", sig.return_annotation) EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 184

🏁 Script executed:

# Look for any mypy test or type checking in the CI/tests cat .github/workflows/ci.yml

Repository: PRIDE-Archive/pyprideap

Length of output: 1143

🏁 Script executed:

# Check if there are any type stubs or annotations for Series.values rg "Series.values" src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Let's look at the full context around the function to understand the types better cat -n src/pyprideap/lod.py | sed -n '1,30p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1585

🏁 Script executed:

# Check the pandas imports and version specifications rg "pandas" pyproject.toml

Repository: PRIDE-Archive/pyprideap

Length of output: 110

🏁 Script executed:

# Let's check if there are any tests for this function cat -n tests/test_lod.py | head -80

Repository: PRIDE-Archive/pyprideap

Length of output: 2956

🏁 Script executed:

# Look at the actual return type by checking what pd.Series.values returns according to pandas stubs # and also check numpy.tile's behavior with Series.values python3 << 'EOF' # Since we can't import numpy/pandas directly in sandbox, let's check the types from pandas documentation # pandas.Series.values returns ndarray | ExtensionArray in strict typing # numpy.tile(arr, reps) expects array_like, which accepts both ndarray and ExtensionArray # But the return type of np.tile is always ndarray # Let's check if there's a type issue with pd.DataFrame.add when receiving values from another Series print("Analysis of type chain:") print("1. base: pd.Series (return from _nclod_base)") print("2. base.values: ndarray | ExtensionArray (pandas type stubs definition)") print("3. np.tile(base.values, ...): always returns ndarray") print("4. pd.DataFrame(..., data=ndarray): creates DataFrame[Any] or DataFrame[Any] based on dtype") print("5. lod_matrix.add(adj.values, axis=0): when adj.values is ndarray | ExtensionArray, result could be Any") print("") print("The issue is that when intermediate types are union types, mypy may lose track of the final type.") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 555

🏁 Script executed:

# Check for type annotations or comments in the file cat -n src/pyprideap/lod.py | sed -n '1,50p'

Repository: PRIDE-Archive/pyprideap

Length of output: 2150

🏁 Script executed:

# Look for the exact types that pandas-stubs uses for Series rg "def values" src/pyprideap/ || echo "No explicit values definition in source"

Repository: PRIDE-Archive/pyprideap

Length of output: 106

🏁 Script executed:

# Let's examine the entire _nclod_base function to verify its return type cat -n src/pyprideap/lod.py | sed -n '108,128p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1013

🏁 Script executed:

# Let's verify if there are actual mypy errors by checking if mypy is run in CI # and looking at any mypy output or error logs grep -r "mypy" .github/workflows/

Repository: PRIDE-Archive/pyprideap

Length of output: 149

🏁 Script executed:

# Let's also check the full _intensity_adjustment function to see its complete return type cat -n src/pyprideap/lod.py | sed -n '130,170p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1946

🏁 Script executed:

# Check if there's a setup.py or any build logs that might show mypy errors fd "setup.py|setup.cfg|*.log" --type f

Repository: PRIDE-Archive/pyprideap

Length of output: 389

🏁 Script executed:

# Let's verify what the expression index length should be vs samples length cat -n tests/test_lod.py | sed -n '70,150p'

Repository: PRIDE-Archive/pyprideap

Length of output: 3466

🏁 Script executed:

# Check the AffinityDataset definition to understand expression vs samples rg "class AffinityDataset" src/pyprideap/ -A 15

Repository: PRIDE-Archive/pyprideap

Length of output: 388

🏁 Script executed:

# Let's check the actual comparison between samples and expression indices cat -n tests/test_lod.py | sed -n '23,80p'

Repository: PRIDE-Archive/pyprideap

Length of output: 2359

🏁 Script executed:

# Let's verify that dataset.samples and dataset.expression have the same length python3 << 'EOF' # From test setup, we can see: # - samples DataFrame is created with n_samples rows # - expression DataFrame is created with n_samples rows # - They should have the same number of rows (samples) print("Dataset structure verification:") print("- samples: rows = sample count, columns = sample metadata") print("- expression: rows = samples (same as samples), columns = assays") print("- base (from _nclod_base): Series indexed by assay names, one value per assay") print("- adj (from _intensity_adjustment): Series indexed by sample index, one value per sample") print("") print("Code at line 199:") print(" np.tile(base.values, (len(dataset.samples), 1))") print(" - Tiles base (assays) across all samples") print(" - Result shape: (num_samples, num_assays)") print("") print("Code at line 200-201:") print(" columns=base.index -> one column per assay") print(" index=dataset.expression.index -> one row per sample") print("") print("The suggested fix changes to:") print(" np.tile(base_values, (len(dataset.expression.index), 1))") print(" - Same result since len(dataset.expression.index) == len(dataset.samples)") print(" - But more semantically explicit") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 870

🏁 Script executed:

# Let's check pandas-stubs to understand the actual type issue better # Search for any type: ignore comments that might exist rg "type: ignore" src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Let's also look at what the actual code does with Series.add cat -n src/pyprideap/lod.py | sed -n '195,210p'

Repository: PRIDE-Archive/pyprideap

Length of output: 712

🏁 Script executed:

# Check if there are any existing type hints or casts in the file rg "cast\|TYPE_CHECKING" src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Let's look at whether the return type truly degrades to Any by examining the full function signature cat -n src/pyprideap/lod.py | sed -n '170,175p'

Repository: PRIDE-Archive/pyprideap

Length of output: 295

🏁 Script executed:

# Check if there's a way to see actual mypy output or check for pandas-stubs behavior # Let's verify the pandas-stubs typing for DataFrame.add with array-like arguments python3 << 'EOF' # According to pandas-stubs and pandas typing, when you call: # DataFrame.add(other: ArrayLike, axis=0) # where ArrayLike = ndarray | ExtensionArray | Index | Series # The return type depends on the union narrowing # The chain: # 1. adj: pd.Series | None (checked against None, so pd.Series in this context) # 2. adj.values: ndarray | ExtensionArray (per pandas-stubs) # 3. lod_matrix.add(adj.values, axis=0): DataFrame # When the argument is a union type (ndarray | ExtensionArray), # mypy may not be able to narrow the return type and could conservatively # assume Any or a less precise type. # The suggested fix: # - base_values: ndarray (explicit conversion) # - adjustment: pd.Series (explicit wrapping) # Both have clear, non-union types that mypy can track through the chain print("Type narrowing analysis:") print("Original code has union types (ndarray | ExtensionArray)") print("at multiple points in the chain, causing potential type loss.") print("") print("Suggested fix converts these to concrete types:") print("- to_numpy() -> ndarray") print("- Explicit pd.Series wrapping -> Series") print("") print("This should preserve type information through the chain.") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 381

🏁 Script executed:

# Let's verify the behavior difference between the two approaches # by checking if there are any tests that might fail with the suggested changes grep -n "compute_nclod\|plate_adjusted" tests/test_lod.py | head -20

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Check for plate-adjusted LOD tests rg -A 20 "plate_adjusted.*True" tests/test_lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Let's examine the entire flow more carefully and check if there are any issues with the suggested fix cat -n src/pyprideap/lod.py | sed -n '107,130p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1063

🏁 Script executed:

# Check what happens when Series are passed to np.tile and DataFrame constructor python3 << 'EOF' # Trace through the suggested fix vs original: print("=== ORIGINAL CODE ===") print("base: pd.Series -> base.values: ndarray | ExtensionArray") print("np.tile(base.values, ...) -> ndarray") print("pd.DataFrame(ndarray, ...) -> DataFrame") print("adj: pd.Series | None -> adj.values: ndarray | ExtensionArray") print("DataFrame.add(ndarray | ExtensionArray, axis=0) -> potentially loses type info with union") print("return lod_matrix -> return type: DataFrame[Any] or imprecise type") print("") print("=== SUGGESTED FIX ===") print("base_values = base.to_numpy(dtype=float, na_value=np.nan) -> ndarray[float]") print("np.tile(base_values, ...) -> ndarray[float]") print("pd.DataFrame(ndarray[float], ...) -> DataFrame[float]") print("adjustment = pd.Series(adj.to_numpy(dtype=float), index=...) -> Series[float]") print("DataFrame.add(Series[float], axis=0) -> DataFrame[float]") print("return lod_matrix -> return type: DataFrame[float]") print("") print("Type preservation: ✓ Fixes union type issue") print("Semantic correctness: ✓ Converts to float, handles NA values") print("Index alignment: ✓ Explicitly sets index to dataset.expression.index") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 999

🏁 Script executed:

# Let's verify there are no tests that would break with these changes grep -B 5 -A 10 "plate_adjusted" tests/test_lod.py | grep -A 10 "def test"

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Check if there are tests specifically for plate-adjusted NCLOD fd "test.*lod" --type f -x cat {} \; | grep -B 5 -A 10 "compute_nclod.*plate"

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Let's look for any tests that verify the return type or structure rg "def test.*nclod" tests/test_lod.py -A 20 | head -80

Repository: PRIDE-Archive/pyprideap

Length of output: 897

🏁 Script executed:

# Let's verify the exact behavior of to_numpy and ensure the fix is sound python3 << 'EOF' # Check pandas Series.to_numpy signature and behavior print("Verification of suggested fix:") print("") print("1. base.to_numpy(dtype=float, na_value=np.nan)") print(" - Returns ndarray with float dtype") print(" - Handles NA values properly") print(" - base is pd.Series from _nclod_base, numeric data") print("") print("2. pd.Series(adj.to_numpy(dtype=float), index=dataset.expression.index)") print(" - Creates Series from ndarray with explicit index") print(" - Ensures index alignment") print(" - adj is pd.Series from _intensity_adjustment, indexed by samples") print("") print("3. np.tile(base_values, (len(dataset.expression.index), 1))") print(" - base_values is ndarray[float] (no union type)") print(" - Clear type throughout") print("") print("4. lod_matrix.add(adjustment, axis=0)") print(" - Both operands have clear types") print(" - DataFrame.add(Series, axis=0) preserves DataFrame type") print("") print("Result: Type chain is fully typed, no Any inference needed") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 827

🏁 Script executed:

# Let's look for any compute_nclod tests that test plate_adjusted behavior cat -n tests/test_lod.py | grep -B 5 -A 15 "def test" | grep -B 5 -A 15 "compute_nclod\|compute_lod_from_controls\|plate"

Repository: PRIDE-Archive/pyprideap

Length of output: 2700

🏁 Script executed:

# Look for integration tests that actually call compute_nclod with plate_adjusted rg "compute_nclod\|compute_lod_from_controls" tests/test_lod.py -B 2 -A 5

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Check the complete test file to see if plate_adjusted scenarios are tested wc -l tests/test_lod.py && cat -n tests/test_lod.py | tail -100

Repository: PRIDE-Archive/pyprideap

Length of output: 4676

🏁 Script executed:

# Let's search for any tests that specifically test the plate_adjusted=True path rg "compute_nclod" tests/test_lod.py -B 2 -A 8

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Check if there's a test that sets up PlateID to test the plate_adjusted path rg "PlateID\|plate_adjusted" tests/test_lod.py -B 3 -A 3

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Let's verify if the suggested fix maintains the correct behavior by checking the indices python3 << 'EOF' # Verify the index alignment in the suggested fix print("Index alignment verification:") print("") print("Original code:") print(" index=dataset.expression.index") print("") print("Suggested fix:") print(" adjustment = pd.Series(adj.to_numpy(dtype=float), index=dataset.expression.index)") print("") print("Key observation:") print("- _intensity_adjustment returns Series indexed by dataset.samples.index") print("- In the fix, adjustment is re-indexed to dataset.expression.index") print("- This assumes dataset.samples.index == dataset.expression.index") print("- From AffinityDataset definition, both have same row count") print("") print("Potential issue:") print("- If indices don't match in order, re-indexing could cause NaN values") print("- The original code uses adj.values which doesn't check index alignment") print("- The suggested fix explicitly re-indexes, which is safer but might change behavior") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 771

🏁 Script executed:

# Let's check if dataset.samples.index and dataset.expression.index are always aligned grep -B 5 -A 10 "class AffinityDataset" src/pyprideap/core.py

Repository: PRIDE-Archive/pyprideap

Length of output: 331

🏁 Script executed:

# Let's look at how the datasets are created in tests to see if indices are aligned cat -n tests/test_lod.py | sed -n '39,80p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1789

🏁 Script executed:

# Let's verify one more important thing - whether the indices actually match or if there's a behavioral change risk python3 << 'EOF' # Check the implications of the behavioral change print("Behavioral analysis of the suggested fix:") print("") print("Original code:") print(" lod_matrix.add(adj.values, axis=0)") print(" - adj.values is ndarray, treated as positional broadcast") print(" - Assumes values are in the same order as lod_matrix rows") print("") print("Suggested fix:") print(" adjustment = pd.Series(adj.to_numpy(dtype=float), index=dataset.expression.index)") print(" lod_matrix.add(adjustment, axis=0)") print(" - adjustment is Series with explicit index") print(" - Pandas will align by index when adding Series") print("") print("Risk assessment:") print("- If indices match by value and order: Behavior is identical") print("- If indices don't match: Suggested fix could cause NaN due to index misalignment") print("") print("From code inspection:") print("- Both samples and expression use default RangeIndex in tests") print("- _intensity_adjustment returns Series(index=dataset.samples.index)") print("- Default index is RangeIndex(0, n_samples)") print("- Both should be identical in practice") print("") print("Conclusion: Suggested fix is safer (explicit index) and equivalent in typical cases") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 994

🏁 Script executed:

# Check if there are any mypy configuration details that would affect type narrowing cat pyproject.toml | grep -A 5 "mypy"

Repository: PRIDE-Archive/pyprideap

Length of output: 287

🏁 Script executed:

# Look at the complete function to understand all type implications cat -n src/pyprideap/lod.py | sed -n '170,205p'

Repository: PRIDE-Archive/pyprideap

Length of output: 1492

Improve type narrowing for DataFrame return type.

base.values and adj.values are typed as ndarray | ExtensionArray by pandas-stubs, which prevents mypy from tracking a precise return type through the DataFrame construction and addition operations. Converting to explicit concrete types (ndarray via to_numpy() and Series via explicit construction) ensures the return type remains DataFrame rather than degrading to Any under warn_return_any mode.

Suggested fix

+ base_values = base.to_numpy(dtype=float, na_value=np.nan) + adjustment = pd.Series(adj.to_numpy(dtype=float), index=dataset.expression.index) lod_matrix = pd.DataFrame( - np.tile(base.values, (len(dataset.samples), 1)), + np.tile(base_values, (len(dataset.expression.index), 1)), columns=base.index, index=dataset.expression.index, ) - lod_matrix = lod_matrix.add(adj.values, axis=0) - return lod_matrix + return lod_matrix.add(adjustment, axis=0)

🧰 Tools

🪛 GitHub Actions: CI

[error] 199-199: mypy: Argument 1 to "tile" has incompatible type "ndarray[tuple[int], dtype[Any]] | ExtensionArray"; expected "_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]]" (arg-type)

[error] 204-204: mypy: Returning Any from function declared to return "DataFrame | Series[Any]". (no-any-return)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/lod.py` around lines 198 - 204, The DataFrame construction and addition degrade type precision because base.values and adj.values are typed as ndarray | ExtensionArray; update the lod_matrix creation to use base.to_numpy() when tiling and convert adj to a pandas.Series (or ensure adj is a Series) before adding so mypy can infer a concrete DataFrame return type: use np.tile(base.to_numpy(), (len(dataset.samples), 1)) for the data and ensure the addition uses a Series (e.g., pd.Series(adj) or adj if already a Series) with axis=0 when calling lod_matrix.add; keep the same column=index and index=dataset.expression.index and return lod_matrix.

coderabbitai · 2026-03-12T11:19:03Z

src/pyprideap/qc/render.py

+_QC_LOD_COLORS = {
+    "PASS & NPX > LOD": "#2ecc71",
+    "PASS & NPX ≤ LOD": "#3498db",
+    "WARN & NPX > LOD": "#f39c12",
+    "WARN & NPX ≤ LOD": "#e74c3c",
+    "FAIL & NPX > LOD": "#95a5a6",
+    "FAIL & NPX ≤ LOD": "#7f8c8d",
+    "PASS": "#2ecc71",
+    "WARN": "#f39c12",
+    "FAIL": "#e74c3c",
+    "NA": "#95a5a6",
+}


⚠️ Potential issue | 🟡 Minor

Verify color semantics for QC categories.

The color assignment may cause confusion: "WARN & NPX ≤ LOD" uses red (#e74c3c), which is the same color as fallback "FAIL", while "FAIL & NPX > LOD" and "FAIL & NPX ≤ LOD" use gray tones. This makes WARN conditions appear more severe than FAIL conditions visually.

If intentional (e.g., "below LOD" is the primary concern), consider documenting the rationale. Otherwise, consider adjusting colors so FAIL-related categories use more alarming colors than WARN-related ones.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/render.py` around lines 31 - 42, The _QC_LOD_COLORS mapping assigns alarming red ("#e74c3c") to "WARN & NPX ≤ LOD" while various "FAIL" keys use gray, which flips visual severity; update the _QC_LOD_COLORS dict so FAIL-related keys (e.g., "FAIL", "FAIL & NPX > LOD", "FAIL & NPX ≤ LOD") use a more alarming color (e.g., red) and WARN-related keys (e.g., "WARN", "WARN & NPX > LOD", "WARN & NPX ≤ LOD") use a less severe color (e.g., orange), or document the intended semantics if the current mapping is deliberate; locate and change the color hex values in the _QC_LOD_COLORS constant in render.py accordingly.

coderabbitai · 2026-03-12T11:19:03Z

src/pyprideap/qc/render.py

+    cumulative = 0.0
+    for cat, cnt, color in zip(data.categories, data.counts, colors):
+        pct = cnt / total * 100 if total > 0 else 0
+        fig.add_trace(
+            go.Bar(
+                x=["Samples"],
+                y=[pct],
+                name=f"{cat} {cnt} ({pct:.1f}%)",
+                marker_color=color,
+                text=f"{pct:.1f}%",
+                textposition="inside",
+            )
+        )
+        cumulative += pct


⚠️ Potential issue | 🟡 Minor

Remove unused cumulative variable.

The variable cumulative is incremented on line 123 but never read. This appears to be dead code, possibly a leftover from development.

🧹 Proposed fix

fig = go.Figure() - cumulative = 0.0 for cat, cnt, color in zip(data.categories, data.counts, colors): pct = cnt / total * 100 if total > 0 else 0 fig.add_trace( go.Bar( x=["Samples"], y=[pct], name=f"{cat} {cnt} ({pct:.1f}%)", marker_color=color, text=f"{pct:.1f}%", textposition="inside", ) ) - cumulative += pct

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cumulative = 0.0

for cat, cnt, color in zip(data.categories, data.counts, colors):

pct = cnt / total * 100 if total > 0 else 0

fig.add_trace(

go.Bar(

x=["Samples"],

y=[pct],

name=f"{cat} {cnt} ({pct:.1f}%)",

marker_color=color,

text=f"{pct:.1f}%",

textposition="inside",

)

)

cumulative += pct

for cat, cnt, color in zip(data.categories, data.counts, colors):

pct = cnt / total * 100 if total > 0 else 0

fig.add_trace(

go.Bar(

x=["Samples"],

y=[pct],

name=f"{cat} {cnt} ({pct:.1f}%)",

marker_color=color,

text=f"{pct:.1f}%",

textposition="inside",

)

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/qc/render.py` around lines 110 - 123, Remove the dead variable cumulative in the loop inside the render function (where cumulative = 0.0 is set and cumulative += pct is applied); delete both the initialization and the increment since cumulative is never read, leaving the loop that builds the go.Bar traces using pct, data.categories, data.counts, and colors intact.

coderabbitai · 2026-03-12T11:19:03Z

src/pyprideap/readers/olink_csv.py

+    # Build per-sample × per-assay LOD matrix if LOD column exists
+    if "LOD" in df.columns:
+        lod_matrix = df.pivot_table(
+            index="SampleID",
+            columns="OlinkID",
+            values="LOD",
+            aggfunc="first",
+        )
+        lod_matrix = lod_matrix.reindex(sample_order).reset_index(drop=True)
+        metadata["lod_matrix"] = lod_matrix


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the olink_csv.py file in full context around the flagged lines cat -n src/pyprideap/readers/olink_csv.py | head -100

Repository: PRIDE-Archive/pyprideap

Length of output: 2789

🏁 Script executed:

# Check the file size to understand its structure wc -l src/pyprideap/readers/olink_csv.py

Repository: PRIDE-Archive/pyprideap

Length of output: 104

🏁 Script executed:

# Look at olink_xlsx.py to verify the same pattern exists cat -n src/pyprideap/readers/olink_xlsx.py | head -100

Repository: PRIDE-Archive/pyprideap

Length of output: 2484

🏁 Script executed:

# Check lod.py to understand how lod_matrix is consumed cat -n src/pyprideap/lod.py

Repository: PRIDE-Archive/pyprideap

Length of output: 23379

🏁 Script executed:

python3 << 'EOF' import pandas as pd import numpy as np # Simulate data: one assay (OlinkID_A) has NPX values but all-null LOD df = pd.DataFrame({ 'SampleID': ['S1', 'S1', 'S2', 'S2'], 'OlinkID': ['OlinkID_A', 'OlinkID_B', 'OlinkID_A', 'OlinkID_B'], 'NPX': [1.0, 2.0, 3.0, 4.0], 'LOD': [0.1, 0.2, 0.3, np.nan], # OlinkID_B has one null LOD }) print("Original DataFrame:") print(df) print("\n" + "="*60) # Build expression (NPX pivot) expression = df.pivot_table( index='SampleID', columns='OlinkID', values='NPX', aggfunc='first', ) print("\nExpression (NPX pivot):") print(expression) print(f"Columns: {list(expression.columns)}") print("\n" + "="*60) # Build lod_matrix (LOD pivot) with some null values lod_matrix = df.pivot_table( index='SampleID', columns='OlinkID', values='LOD', aggfunc='first', ) print("\nLOD Matrix (LOD pivot):") print(lod_matrix) print(f"Columns: {list(lod_matrix.columns)}") print("\n" + "="*60) print("\nColumn alignment issue:") print(f"expression has {len(expression.columns)} columns: {list(expression.columns)}") print(f"lod_matrix has {len(lod_matrix.columns)} columns: {list(lod_matrix.columns)}") print(f"Alignment problem: {set(expression.columns) != set(lod_matrix.columns)}") # Test with completely null LOD for one OlinkID print("\n" + "="*60) print("\nTest 2: OlinkID with ALL-NULL LOD") df2 = pd.DataFrame({ 'SampleID': ['S1', 'S1', 'S2', 'S2'], 'OlinkID': ['OlinkID_A', 'OlinkID_B', 'OlinkID_A', 'OlinkID_B'], 'NPX': [1.0, 2.0, 3.0, 4.0], 'LOD': [np.nan, np.nan, np.nan, np.nan], # ALL null LOD }) expression2 = df2.pivot_table( index='SampleID', columns='OlinkID', values='NPX', aggfunc='first', ) lod_matrix2 = df2.pivot_table( index='SampleID', columns='OlinkID', values='LOD', aggfunc='first', ) print(f"expression2 columns: {list(expression2.columns)}") print(f"lod_matrix2 columns: {list(lod_matrix2.columns)}") print(f"Alignment problem: {set(expression2.columns) != set(lod_matrix2.columns)}") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 185

🏁 Script executed:

# Also verify the suggested fix would work python3 << 'EOF' import pandas as pd import numpy as np df = pd.DataFrame({ 'SampleID': ['S1', 'S1', 'S2', 'S2'], 'OlinkID': ['OlinkID_A', 'OlinkID_B', 'OlinkID_A', 'OlinkID_B'], 'NPX': [1.0, 2.0, 3.0, 4.0], 'LOD': [np.nan, np.nan, np.nan, np.nan], }) sample_order = df['SampleID'].unique() expression = df.pivot_table( index='SampleID', columns='OlinkID', values='NPX', aggfunc='first', ).reindex(sample_order).reset_index(drop=True) lod_matrix = df.pivot_table( index='SampleID', columns='OlinkID', values='LOD', aggfunc='first', ) print("Before fix:") print(f"expression columns: {list(expression.columns)}") print(f"lod_matrix columns: {list(lod_matrix.columns)}") # Apply suggested fix lod_matrix_fixed = lod_matrix.reindex( index=sample_order, columns=expression.columns, ).reset_index(drop=True) print("\nAfter fix:") print(f"expression columns: {list(expression.columns)}") print(f"lod_matrix_fixed columns: {list(lod_matrix_fixed.columns)}") print(f"Aligned: {list(expression.columns) == list(lod_matrix_fixed.columns)}") EOF

Repository: PRIDE-Archive/pyprideap

Length of output: 185

🏁 Script executed:

# Check if there are any tests in the codebase that verify lod_matrix alignment find . -type f -name "*.py" -path "*/test*" | head -20

Repository: PRIDE-Archive/pyprideap

Length of output: 427

🏁 Script executed:

# Search for tests related to olink readers find . -type f -name "*.py" | xargs grep -l "read_olink\|lod_matrix" | grep -i test

Repository: PRIDE-Archive/pyprideap

Length of output: 135

🏁 Script executed:

# Examine test_readers.py to see what's tested cat -n tests/test_readers.py | head -200

Repository: PRIDE-Archive/pyprideap

Length of output: 6906

🏁 Script executed:

# Examine test_lod.py cat -n tests/test_lod.py | head -150

Repository: PRIDE-Archive/pyprideap

Length of output: 5992

🏁 Script executed:

# Check conftest.py for fixtures cat -n tests/conftest.py

Repository: PRIDE-Archive/pyprideap

Length of output: 792

🏁 Script executed:

# Look for fixture data that might test lod_matrix find tests -type f \( -name "*.csv" -o -name "*.xlsx" \) | head -20

Repository: PRIDE-Archive/pyprideap

Length of output: 158

🏁 Script executed:

# Search specifically for any tests that verify lod_matrix has same columns as expression rg "lod_matrix.*columns\|expression\.columns" tests/

Repository: PRIDE-Archive/pyprideap

Length of output: 49

🏁 Script executed:

# Check the olink sample data to see if it has LOD column and structure head -20 tests/data/olink_sample.npx.csv

Repository: PRIDE-Archive/pyprideap

Length of output: 1012

🏁 Script executed:

# Examine the structure more carefully head -10 tests/data/olink_sample.npx.csv | cat -A

Repository: PRIDE-Archive/pyprideap

Length of output: 780

🏁 Script executed:

# Count rows and columns wc -l tests/data/olink_sample.npx.csv

Repository: PRIDE-Archive/pyprideap

Length of output: 101

🏁 Script executed:

# Get the CSV structure to understand the test data cut -d',' -f1-10 tests/data/olink_sample.npx.csv | head -15

Repository: PRIDE-Archive/pyprideap

Length of output: 875

🏁 Script executed:

# Check if there are any other usages of lod_matrix in the codebase rg "lod_matrix" --type py -A 3 -B 3

Repository: PRIDE-Archive/pyprideap

Length of output: 9550

Ensure lod_matrix columns match expression.columns at creation.

pivot_table() can drop assay columns where LOD values are entirely null, causing metadata["lod_matrix"] to be narrower than expression. While lod.py includes defensive alignment in _above_lod_matrix(), the mismatch should be prevented at creation time. This pattern is inconsistent with how features are explicitly aligned (line 43). The same issue exists in src/pyprideap/readers/olink_xlsx.py.

Suggested fix

if "LOD" in df.columns: lod_matrix = df.pivot_table( index="SampleID", columns="OlinkID", values="LOD", aggfunc="first", ) - lod_matrix = lod_matrix.reindex(sample_order).reset_index(drop=True) + lod_matrix = lod_matrix.reindex( + index=sample_order, + columns=expression.columns, + ).reset_index(drop=True) metadata["lod_matrix"] = lod_matrix

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pyprideap/readers/olink_csv.py` around lines 47 - 56, The LOD pivot can drop assay columns with all-NA values causing metadata["lod_matrix"] to not match expression.columns; after creating lod_matrix with df.pivot_table (and before resetting index), reindex its columns to the same assay order used for expression/features (the same reference used at line where features are aligned) and fill missing columns with NaN so lod_matrix.columns == expression.columns; apply the identical fix in olink_xlsx.py where lod_matrix is built, ensuring you reference the same sample_order and assay ordering variable (expression.columns or features) when reindexing.

ypriverol and others added 21 commits March 10, 2026 20:47

chore: scaffold project structure with pyproject.toml

27093eb

feat: add core data model — Platform, AffinityDataset, ValidationResult

02d9e7c

test: add fixture files for Olink CSV/parquet and SomaScan ADAT/CSV

3d510b1

feat: add Olink CSV reader with long-format NPX parsing

18a4ec1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add Olink Parquet reader for Explore HT data

673d3eb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add SomaScan ADAT reader with three-section parser

1dffbde

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add SomaScan CSV reader for processed wide-format data

10f82f3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add BaseValidator protocol

8ead919

feat: add auto-detection registry for file format routing

460b784

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add Olink Explore validator with schema and QC checks

dcefed2

feat: add Olink Target validator (extends Explore validator)

1bba70d

feat: add SomaScan validator and validator dispatch

3406f59

feat: add Olink XLSX reader and update registry

89af34c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci: add GitHub Actions workflow for lint, test, and typecheck

ab34bcd

docs: add comprehensive README with usage examples

70968c0

feat: add statistics module with DatasetStats and compute_stats

2dc0854

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add PRIDE API client for PAD project metadata and file listing

1c52297

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: wire up public API — read(), validate(), compute_stats()

35aef26

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add end-to-end integration tests

30e2a59

Tests the full read -> validate -> stats pipeline for all four supported formats (Olink CSV, Parquet, SomaScan ADAT, SomaScan CSV). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

First commit to dev branch

910ac2c

ypriverol requested a review from chakrabandla March 11, 2026 10:29

qodo-code-review bot reviewed Mar 11, 2026

View reviewed changes

src/pyprideap/readers/somascan_csv.py Show resolved Hide resolved

src/pyprideap/readers/registry.py Show resolved Hide resolved

src/pyprideap/readers/olink_csv.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

LOD calculation integrated.

bc32e82

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

minro changes suggested by coderabbit

e39bda9

ypriverol and others added 3 commits March 11, 2026 15:28

feat(qc): add qc package with plot data dataclasses and optional deps

6f70703

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(qc): add compute_distribution, compute_qc_summary, compute_lod_a…

42f8e5f

…nalysis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(qc): add compute_pca, compute_correlation, compute_missing_value…

59a3e78

…s, compute_cv_distribution, compute_detection_rate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

ypriverol and others added 6 commits March 11, 2026 15:37

feat(qc): add Plotly render functions for all plot types

477d638

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

minor changes for the LOD computation

5cb8d56

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

Major changes on reports.

7198f8d

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

		return pd.Series(lod_series.values, index=dataset.expression.columns[: len(lod_series)])


		_CONFIGS_DIR = Path(__file__).resolve().parent.parent.parent / "configs"

		def _sample_id_col(dataset: AffinityDataset) -> str:
		return "SampleID" if "SampleID" in dataset.samples.columns else "SampleId"

		elif "MissingFreq" in dataset.metadata:
		freq = list(dataset.metadata["MissingFreq"])

	output.write_text(html)
	output.write_text(html, encoding="utf-8")

	_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD"}
	_FEATURE_COLS = {"OlinkID", "UniProt", "Assay", "Panel", "LOD", "MissingFreq"}

Conversation

ypriverol commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

qodo-code-review bot commented Mar 11, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Feedback 🧐

(Feedback updated until commit 5cb8d56)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

ypriverol commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

qodo-code-review bot commented Mar 11, 2026 •

edited

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

qodo-code-review bot commented Mar 12, 2026 •

edited

Loading

(Feedback updated until commit `5cb8d56`)