BF: Seek file data to 0 before every retry in request(), not just for retryable status codes#1824
BF: Seek file data to 0 before every retry in request(), not just for retryable status codes#1824yarikoptic merged 2 commits intomasterfrom
Conversation
… retryable status codes Previously, data.seek(0) was only called inside the retry_statuses/retry_if branch, which handles specific HTTP response codes (429, 500-504, and zarr-specific conditions). When a ConnectionError occurred mid-upload (e.g., connection dropped during a multi-minute large zarr chunk transfer), tenacity caught the exception and retried, but the file pointer was left at whatever position the read was interrupted. On retry, requests computed Content-Length as (file_size - current_position) and sent only the tail of the file. S3 received partial data whose MD5 didn't match the Content-MD5 header (computed from the full file), resulting in a BadDigest 400 error. This error was also not retried (not in RETRY_STATUSES, not matched by retry_if), so the upload failed permanently. The fix uses tenacity's before_sleep callback to seek the file data back to position 0 before every retry, regardless of whether the retry was triggered by ConnectionError, HTTPError, or a retryable status code. Evidence from two independent logs (issue #1821): - Linux: 188 level-0 zarr chunks failed (large, ~6 min upload each), 67 level-2 chunks succeeded (small, fast upload) -- all with "succeeded after 1 retry" + BadDigest - Windows local disk: 187 level-0 failures after 2-8 retries, 68 level-1 successes -- ruling out NFS/filesystem as the cause - Both logs show "Resetting dropped connection: dandiarchive.s3.amazonaws.com" confirming ConnectionErrors preceded the BadDigest errors Closes #1821 Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1824 +/- ##
=======================================
Coverage 75.12% 75.12%
=======================================
Files 84 84
Lines 11930 11931 +1
=======================================
+ Hits 8962 8963 +1
Misses 2968 2968
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
User confirmed the resolution of the digest mismatch What a silly source of such an error Great to confirm no files were harmed in the process of |
|
Thanks @CodyCBakerPhD for the review. I will wait for CI to pass and then merge -- there will be follow ups though. |
@dstansby please try
claude's slopsplaining:
Previously, data.seek(0) was only called inside the retry_statuses/retry_if branch, which handles specific HTTP response codes (429, 500-504, and zarr-specific conditions). When a ConnectionError occurred mid-upload (e.g., connection dropped during a multi-minute large zarr chunk transfer), tenacity caught the exception and retried, but the file pointer was left at whatever position the read was interrupted.
On retry, requests computed Content-Length as (file_size - current_position) and sent only the tail of the file. S3 received partial data whose MD5 didn't match the Content-MD5 header (computed from the full file), resulting in a BadDigest 400 error. This error was also not retried (not in RETRY_STATUSES, not matched by retry_if), so the upload failed permanently.
The fix uses tenacity's before_sleep callback to seek the file data back to position 0 before every retry, regardless of whether the retry was triggered by ConnectionError, HTTPError, or a retryable status code.
Evidence from two independent logs (issue #1821):