Skip to content

feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg#4242

Open
mengw15 wants to merge 51 commits intoapache:mainfrom
mengw15:Restful-Catalog4
Open

feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg#4242
mengw15 wants to merge 51 commits intoapache:mainfrom
mengw15:Restful-Catalog4

Conversation

@mengw15
Copy link
Contributor

@mengw15 mengw15 commented Feb 27, 2026

What changes were proposed in this PR?

This PR introduces Lakekeeper as a REST catalog service for Iceberg, replacing direct JDBC catalog connections. Key changes include:

  • Lakekeeper bootstrap script (bin/bootstrap-lakekeeper.sh): automates Lakekeeper setup with MinIO as the S3-compatible storage backend.
  • Iceberg catalog migration (Scala & Python): updated Scala side and Python side to connect via the Lakekeeper REST catalog instead of direct JDBC.
  • Single-node deployment: updated bin/single-node/docker-compose.yml to include Lakekeeper and MinIO services.
  • Kubernetes deployment: added Lakekeeper init job, external-names for service discovery, and exposed Lakekeeper to the computing-unit pool.

Post-merge setup for developers

After this PR is merged, each developer needs to perform the following one-time setup:

  1. Create the Lakekeeper database

psql -f sql/texera_lakekeeper.sql

  1. Download the Lakekeeper binary

Go to the Lakekeeper releases page and download the binary for your platform
Place it somewhere on your machine

  1. Configure bin/bootstrap-lakekeeper.sh

Edit the User Configuration section at the top of the script:

LAKEKEEPER_BINARY_PATH — path to the downloaded Lakekeeper binary
LAKEKEEPER__PG_DATABASE_URL_READ / LAKEKEEPER__PG_DATABASE_URL_WRITE — PostgreSQL connection URL in the format postgres://username:password@hostname:5432/texera_lakekeeper

  1. Run the bootstrap script

./bin/bootstrap-lakekeeper.sh
This will start Lakekeeper, create the default project, set up the MinIO bucket, and create the warehouse.

Any related issues, documentation, discussions?

Closes #4126

How was this PR tested?

Tested manually on single-node Docker Compose deployment and Kubernetes cluster

Was this PR authored or co-authored using generative AI tooling?

co-authored with AI

@mengw15 mengw15 marked this pull request as draft February 27, 2026 06:50
@github-actions github-actions bot added engine dependencies Pull requests that update a dependency file ddl-change Changes to the TexeraDB DDL python build service common labels Feb 27, 2026
@github-actions github-actions bot added the ci changes related to CI label Mar 2, 2026
@mengw15 mengw15 marked this pull request as ready for review March 4, 2026 23:52
@mengw15 mengw15 changed the title feat: introduce Lakekeeper as REST catalog for Iceberg storage feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg storage Mar 5, 2026
@mengw15 mengw15 changed the title feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg storage feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg Mar 5, 2026
@mengw15
Copy link
Contributor Author

mengw15 commented Mar 5, 2026

@bobbai00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build ci changes related to CI common ddl-change Changes to the TexeraDB DDL dependencies Pull requests that update a dependency file engine python service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate to Result Service and MinIO for Execution Results

2 participants