Release/0.7.0 by jackiryan · Pull Request #152 · podaac/bignbit

jackiryan · 2026-02-19T21:38:44Z

Description

This release is primarily intended to add support for the GHRSST L4 MUR product from PODAAC.

Refactored pipeline to use more concise messages between lambdas. The "Browse Image Transfer" (BIT) workflow is now more consolidated and the following lambdas from version <=0.6.0 are now part of the same lambda:

process_harmony_results (reading result urls from Harmony API request and generating checksums)
generate_image_metadata (producing the image metadata xml for GIBS with data start and end times)
build_image_sets (associating sets of browse image, world file, and image metadata xml)
save_cnm_message (saving the cnm json message with image set information for GIBS)

All of these functions occur synchronously, so they have been combined into the new "Handle BIG Result" step of the pipeline which produces and saves CNM messages to S3. This was done to reduce the size of messages passed by the bignbit pipeline between steps, since these messages are limited to 256KB in size. For datasets like GHRSST MUR where many tiled images are produced, the old system caused Step Functions to fail the workflow due to oversized messages. Rather than introduce a new database or save additional intermediate files to S3, this approach both simplifies the workflow and mitigates the message size issue.

In practice, this means that the final state output of the workflow has changed. Previously, the pobit item of the state payload contained an array of image_set objects. Now it contains an array of references to CNM messages that have been sent to GIBS over the SQS queue:

  "pobit": [
      {
        "cmr_provider": "LARC_CLOUD",
        "collection_name": "PREFIRE_SAT2_2B-FLX_EEDTEST",
        "cnm_bucket": "podaac-sit-svc-internal",
        "cnm_key": "bignbit-cnm-output/PREFIRE_SAT2_2B-FLX_EEDTEST_flx_LL/PREFIRE_SAT2_2B-FLX_S07_R00_20210721013413_03040.nc.G00.2026-02-19T17:27:01.816Z.cnm.json",
        "gibs": {
          "cnmContent": {
            "MD5OfMessageBody": "9597edb67114477b7c1133aec065062d",
            "MD5OfMessageAttributes": "f7aec4559a577e6f7a0ba62823347d93",
            "MessageId": "ffa61002-f475-48b6-b02f-fe89b408e3ea",
            "SequenceNumber": "18900253714877423616",
            "ResponseMetadata": {
              "RequestId": "e900647d-6804-5d8d-b4a2-066c2237bebc",
              "HTTPStatusCode": 200,
              "HTTPHeaders": {
                "x-amzn-requestid": "e900647d-6804-5d8d-b4a2-066c2237bebc",
                "date": "Thu, 19 Feb 2026 17:27:15 GMT",
                "content-type": "text/xml",
                "content-length": "512",
                "connection": "keep-alive"
              },
              "RetryAttempts": 0
            }
          }
        }
      },
      ...
  ]

Additionally, warnings are now issued when a Harmony API call reports success but produces no data. This situation is occasionally encountered with some data sets, though its root cause is unknown. For now, we want to capture the occurrences without failing the overall workflow since other variables or projections in the browse image workflow may still succeed.

Added

issues/108: Handle case when no data is returned from a Harmony job by throwing a warning that can be tracked in CloudWatch logs.

Changed

issues/148: Refactored message passing system after browse image generation to handle large tiled outputs (100s of output files).

Removed

issues/148: Removed "Generate Image Metadata", "Build Image Sets", "Process Harmony Results", and "Save CNM Message" lambdas in favor of a consolidated "Process BIG Result" lambda that generates and save CNM messages for the entire result of a browse image generation workflow.

Overview of verification done

Tested the following data sets in the SIT venue:
- OPERA HLS
- PREFIRE COG
- TEMPO NO2
- GHRSST L4 MUR (new!!)

Overview of integration done

Integration testing in UAT is TBD.

PR checklist:

Linted
Updated unit tests
Updated changelog
Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

# Conflicts: # pyproject.toml

) * issue/108: handle case when no data is returned from a Harmony job * Updated step function graph

This workflow automatically: - Adds new issues and PRs to the podaac project - Sets their status to 'needs:triage' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* issue/148: Refactored format of harmony job status passed to generate image metadata * Removed process_harmony_results.py lambda and associated tests * Fix linter error * Updated terraform scripts to remove references to process_harmony_job_output and provide env variables to generate_image_metadata * Refactored generate_image_metadata, build_image_sets, and save_cnm_message into one lambda * Fixed bug with collection name key * issues/148: Updated unit tests, changelog, and step function graph * issue/148: fixed issue with CICD pipeline workflow not creating manifest list * issue/148: set provenance to false in build and publish docker image step * issue/148: fixed a bug so that image sets with no world file are allowed * issue/148: fixed bug where HarmonyJobNoDataError pass state triggered KeyError * issue/148: Minor changes to address copilot code review comments

Copilot

Pull request overview

Release 0.7.0 refactor to support large/tiled BIG outputs (e.g., GHRSST L4 MUR) by consolidating post-BIG processing into a single step that generates ImageMetadata XML and writes CNM messages to S3, reducing Step Functions payload sizes and improving handling of “successful but no data” Harmony outcomes.

Changes:

Consolidates prior post-BIG lambdas into a new handle_big_result lambda that generates metadata, builds image sets, and writes CNMs to S3 (workflow now passes CNM references instead of full image-set payloads).
Updates Step Functions + Terraform to use the new pipeline shape and adds a no-data end path for Harmony “successful with no results”.
Updates/rewrites unit tests and fixtures to match the new message-passing and CNM-on-S3 approach.

Reviewed changes

Copilot reviewed 35 out of 37 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_utils.py	New consolidated unit tests for shared utils (dates, hashing, harmony client, etc.).
tests/test_send_to_gitc.py	New moto-based tests for reading CNM from S3 and sending to SQS.
tests/test_send_to_gibs_moto.py	Removes old end-to-end moto test tied to deprecated lambdas.
tests/test_process_harmony_results.py	Removes tests for deprecated `process_harmony_results` lambda.
tests/test_image_set.py	Updates ImageSet/CNM construction tests for the new `handle_big_result` path.
tests/test_handle_big_result.py	Adds unit tests for consolidated BIG-result handling (metadata XML + CNM writing).
tests/test_get_harmony_job_status.py	Adds test for new `HarmonyJobNoDataError` behavior.
tests/test_generate_image_metadata.py	Removes tests for deprecated `generate_image_metadata` lambda.
tests/test_format_iso_expiration_date.py	Removes file; coverage moved into `tests/test_utils.py`.
tests/test_build_image_sets.py	Removes tests for deprecated `build_image_sets` lambda.
tests/sample_messages/send_to_gitc/sample_cnm_message.json	Adds sample CNM used by new send-to-GITC tests.
tests/sample_messages/generate_image_metadata/cma.uat.input.TEMPO_NO2_L3.json	Updates sample CMA payload format used in tests.
tests/sample_messages/generate_image_metadata/cma.uat.input.PREFIRE_SAT2_2B-FLX.json	Adds new sample CMA payload for tests.
tests/conftest.py	Formats VCR config fixture.
tests/cassettes/test_handle_big_result/test_process_harmony_results.yaml	Adds VCR cassette for Harmony results used by new tests.
tests/cassettes/test_get_harmony_job_status/test_process_results_no_data.yaml	Adds VCR cassette for Harmony “successful but no data” case.
tests/cassettes/test_get_harmony_job_status/test_process_results.yaml	Adds VCR cassette for Harmony status path used by tests.
terraform/state_machine_definition.tpl	Refactors workflow: new `Handle BIG Result`, CNM reference passing, and no-data handling path.
terraform/outputs.tf	Updates outputs to reference `handle_big_result` and removes deprecated lambda outputs.
terraform/lambda_functions.tf	Replaces multiple lambdas with consolidated `handle_big_result` lambda config.
pyproject.toml	Bumps version to `0.7.0rc1`.
bignbit/utils.py	Adds helpers (S3 byte upload, date parsing helpers) and improves typing.
bignbit/send_to_gitc.py	Updates send-to-GITC to read CNM from S3 and send raw CNM JSON to SQS.
bignbit/save_cnm_message.py	Removes deprecated lambda.
bignbit/process_harmony_results.py	Removes deprecated lambda.
bignbit/image_set.py	Refactors ImageSet model + image-set building logic to support consolidated pipeline.
bignbit/handle_big_result.py	New consolidated lambda: Harmony result processing, metadata XML generation, CNM creation + S3 write.
bignbit/get_harmony_job_status.py	Adds `HarmonyJobNoDataError` and checks for empty result URLs on “successful”.
bignbit/generate_image_metadata.py	Removes deprecated lambda.
bignbit/build_image_sets.py	Removes deprecated lambda.
README.md	Documents no-data Harmony behavior (needs alignment with new step/lambda names).
CHANGELOG.md	Adds 0.7.0 changelog entry and formatting updates.
.github/workflows/cicd-pipeline.yml	Adds Buildx setup + pins build platform for container builds.
.github/workflows/add-to-project.yml	Adds automation to add issues/PRs to GitHub Project and set triage status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T19:58:38Z

bignbit/handle_big_result.py

+                return {'pobit': []}
+        else:
+            cma_file_list = result_list
+            partial_id = utils.extract_mgrs_grid_code(granule_umm_json)


Handle BIG Result always re-extracts partial_id in the non-Harmony branch (partial_id = utils.extract_mgrs_grid_code(...)) without a try/except. This branch is also reached when sendToHarmony is false and operaHLSTreatment is false (i.e., datasets that are neither Harmony-derived nor OPERA HLS), so this can raise KeyError and fail workflows for granules that don't have an MGRS tile ID. Consider keeping the earlier try/except result (allowing partial_id=None) and only requiring MGRS when OPERA HLS treatment is actually enabled.

Suggested change

partial_id = utils.extract_mgrs_grid_code(granule_umm_json)

# partial_id was already safely extracted above (may be None for non-MGRS granules)

Copilot · 2026-03-23T19:58:39Z

terraform/state_machine_definition.tpl

                        "task_config":{
                          "cmr_environment":"{$.meta.cmr.cmrEnvironment}",
                          "harmony_job":"{$.payload.harmony_job.job}",
+                          "variable":"{$.current_variable.id}",


Get Harmony Job Status is passed variable in the Step Functions task_config, but the lambda reads current_variable (self.config.get('current_variable')). As written, the HarmonyJobNoDataError message/log will lose the variable context (empty string), defeating the intent of tracking no-data cases per variable. Either pass current_variable here (or update the lambda to read variable consistently).

Suggested change

"variable":"{$.current_variable.id}",

"current_variable":"{$.current_variable.id}",

Copilot · 2026-03-23T19:58:39Z

README.md

+"Process Harmony Job Output" workflow step, defined in the code as the `process_harmony_results.py` lambda, will raise 
+a HarmonyJobNoDataError. This exception does not trigger a failure in the whole workflow, as other variables associated with the granule may succeed.
+To track Harmony jobs that return no data in CloudWatch, the following warning can be tracked:
+```
+"Harmony job {job_id} completed successfully but returned no data for variable '{variable}' and CRS '{crs}'"
+```


This section refers to a removed/renamed workflow step and lambda ("Process Harmony Job Output" / process_harmony_results.py). In the updated pipeline, the no-data condition is detected in the Harmony status check (get_harmony_job_status.py raising HarmonyJobNoDataError) and is caught in the Step Functions Get Harmony Job Status state. Please update the doc text and the example warning string to match the current behavior so operators can reliably filter CloudWatch logs.

Copilot · 2026-03-23T19:58:39Z

pyproject.toml

 [tool.poetry]
 name = "bignbit"
-version = "0.6.0"
+version = "0.7.0rc1"


pyproject.toml sets the package version to 0.7.0rc1, but the PR title and changelog entry are for 0.7.0. If this is intended to be the final 0.7.0 release, align the Poetry version with 0.7.0 (or, if you do want an RC, consider updating the changelog heading/release process artifacts to match).

jackiryan and others added 11 commits January 7, 2026 17:38

/version 0.7.0a0

572803e

Merge branch 'main' into develop

cfd01ec

# Conflicts: # pyproject.toml

/version 0.7.0a1

48d8ca5

issue/108: handle case when no data is returned from a Harmony job (#146

366af04

) * issue/108: handle case when no data is returned from a Harmony job * Updated step function graph

/version 0.7.0a2

af90b00

Add automatic project assignment workflow

d2e05ea

This workflow automatically: - Adds new issues and PRs to the podaac project - Sets their status to 'needs:triage' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

/version 0.7.0a3

ce427a6

/version 0.7.0a4

fb9fcdc

Updated changelog for 0.7.0 release

2154002

/version 0.7.0rc1

9cf12bb

github-project-automation bot added this to podaac Feb 19, 2026

tloubrieu-jpl moved this to needs:triage in podaac Feb 19, 2026

jackiryan requested a review from jamesfwood February 19, 2026 21:39

tloubrieu-jpl moved this from needs:triage to routed in podaac Feb 21, 2026

jamesfwood approved these changes Mar 23, 2026

View reviewed changes

jamesfwood requested a review from Copilot March 23, 2026 19:54

Copilot started reviewing on behalf of jamesfwood March 23, 2026 19:54 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/0.7.0#152

Release/0.7.0#152
jackiryan wants to merge 11 commits intomainfrom
release/0.7.0

jackiryan commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	partial_id = utils.extract_mgrs_grid_code(granule_umm_json)
	# partial_id was already safely extracted above (may be None for non-MGRS granules)

	"variable":"{$.current_variable.id}",
	"current_variable":"{$.current_variable.id}",

Conversation

jackiryan commented Feb 19, 2026

Description

Added

Changed

Removed

Overview of verification done

Overview of integration done

PR checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants