[BUG][Dataloader] preserve column casing in DataFusion SQL dialect to fix camelCase column lookups by ShreyeshArangath · Pull Request #536 · linkedin/openhouse

ShreyeshArangath · 2026-04-08T20:19:37Z

Summary

The DataFusion dialect's NORMALIZATION_STRATEGY was set to LOWERCASE, causing sqlglot to lowercase all identifiers during SQL optimization. This broke tables with camelCase columns (e.g. viewerId, feedPosition) because both DataFusion execution and PyIceberg scans are case-sensitive.

Change the strategy to CASE_SENSITIVE, which matches DataFusion's actual behavior and preserves original identifier casing throughout the pipeline.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

…e column lookups The DataFusion dialect's NORMALIZATION_STRATEGY was set to LOWERCASE, causing sqlglot to lowercase all identifiers during SQL optimization. This broke tables with camelCase columns (e.g. viewerId, feedPosition) because both DataFusion execution and PyIceberg scans are case-sensitive. Change the strategy to CASE_SENSITIVE, which matches DataFusion's actual behavior and preserves original identifier casing throughout the pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… tests Makes the test data truly ambiguous — all three columns lowercase to "userid", so a lowercasing dialect would collapse them into one column. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-case tests Renames generic userId/USERID/UserID to purchaseAmount/PURCHASEAMOUNT/ PurchaseAmount for better readability while preserving the case-collision property that makes the tests meaningful. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replaces colliding casing variants (purchaseAmount/PURCHASEAMOUNT/ PurchaseAmount) with distinct descriptive columns (purchaseAmount, itemCount, discountRate) that better represent a real schema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ShreyeshArangath marked this pull request as ready for review April 8, 2026 20:23

ShreyeshArangath and others added 3 commits April 8, 2026 20:25

test: use colliding column names (userId/USERID/UserID) in mixed-case…

283f8cb

… tests Makes the test data truly ambiguous — all three columns lowercase to "userid", so a lowercasing dialect would collapse them into one column. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

robreeves approved these changes Apr 8, 2026

View reviewed changes

ShreyeshArangath merged commit f9fccaa into linkedin:main Apr 8, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][Dataloader] preserve column casing in DataFusion SQL dialect to fix camelCase column lookups#536

[BUG][Dataloader] preserve column casing in DataFusion SQL dialect to fix camelCase column lookups#536
ShreyeshArangath merged 4 commits intolinkedin:mainfrom
ShreyeshArangath:bug/lowercase

ShreyeshArangath commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShreyeshArangath commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing Done

Additional Information

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShreyeshArangath commented Apr 8, 2026 •

edited

Loading