Skip to content

[CS-148] Set s3 client config setting to validate checksum only when required#35638

Open
patrickwwbutler wants to merge 6 commits intoMaterializeInc:mainfrom
patrickwwbutler:patrick/disable-checksum-s3
Open

[CS-148] Set s3 client config setting to validate checksum only when required#35638
patrickwwbutler wants to merge 6 commits intoMaterializeInc:mainfrom
patrickwwbutler:patrick/disable-checksum-s3

Conversation

@patrickwwbutler
Copy link
Copy Markdown
Contributor

When using COPY FROM on parquet files from GCS, we make requests for ranges of the files, but GCS does not support checksums for partial objects, causing us to get checksum errors. This setting avoids these validations and makes copy from gcs possible with parquet files.

materialize=> COPY INTO parquet_table FROM 'gs://mz-scratch-public/copyfromtest' (FORMAT PARQUET, AWS CONNECTION = gcs_connection, PATTERN = '*types_large*');
COPY 53000

@patrickwwbutler patrickwwbutler requested review from a team and sidsaw-mz March 25, 2026 20:55
@patrickwwbutler patrickwwbutler requested a review from a team as a code owner March 25, 2026 20:55
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Copy link
Copy Markdown
Contributor

@martykulma martykulma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @patrickwwbutler, looks great! I happened to look up path-style access and found AWS is in the process of deprecating it, so we probably need to avoid it in case it causes problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants