Skip to content

Add Options for DataFrameWriter#167

Open
pequalsnp wants to merge 4 commits intoapache:masterfrom
pequalsnp:data-fram-write-options
Open

Add Options for DataFrameWriter#167
pequalsnp wants to merge 4 commits intoapache:masterfrom
pequalsnp:data-fram-write-options

Conversation

@pequalsnp
Copy link
Copy Markdown

@pequalsnp pequalsnp commented Sep 5, 2025

What changes were proposed in this pull request?

  • This pipes options through DataFrameWriter into the WriterOperation proto message.

Why are the changes needed?

  • Write options are supported in the API already, this just exposes the ability to set them in golang
  • Write options are needed for many output formats, in my specific case Opensearch

Does this PR introduce any user-facing change?

Yes!

It adds the ability to set options on DataFrameWriter

How was this patch tested?

Added a call to Option in the unit tests.
Added an integration test for writer options

@pequalsnp pequalsnp changed the title [WIP] Add Options for DataFrameWriter Add Options for DataFrameWriter Sep 5, 2025
@pequalsnp pequalsnp marked this pull request as ready for review September 5, 2025 12:49
Copy link
Copy Markdown

@caldempsey caldempsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an integration test to demonstrate usage and use a Dataframe option which would affect the displayed results? Maybe something like NULLVALUE -> NA.

See https://github.com/apache/spark-connect-go/tree/master/internal/tests/integration

@pequalsnp
Copy link
Copy Markdown
Author

➜  spark-connect-go git:(data-fram-write-options) ✗ SPARK_HOME=~/spark-4.0.1-bin-hadoop3 make integration
>> TEST, "integration"
ok      github.com/apache/spark-connect-go/internal/tests/integration   6.281s

@pequalsnp
Copy link
Copy Markdown
Author

Can you add an integration test to demonstrate usage and use a Dataframe option which would affect the displayed results? Maybe something like NULLVALUE -> NA.

See https://github.com/apache/spark-connect-go/tree/master/internal/tests/integration

I used the header option on the CSV reader/writer. Created a df, wrote it with a header, read it back with a header, verified there were only 2 lines

@pequalsnp pequalsnp requested a review from caldempsey December 3, 2025 19:36
@pequalsnp pequalsnp force-pushed the data-fram-write-options branch from 007e7c4 to c2fc9a0 Compare December 3, 2025 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants