Conversation
# Conflicts: # pyproject.toml
This workflow automatically: - Adds new issues and PRs to the podaac project - Sets their status to 'needs:triage' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* issue/148: Refactored format of harmony job status passed to generate image metadata * Removed process_harmony_results.py lambda and associated tests * Fix linter error * Updated terraform scripts to remove references to process_harmony_job_output and provide env variables to generate_image_metadata * Refactored generate_image_metadata, build_image_sets, and save_cnm_message into one lambda * Fixed bug with collection name key * issues/148: Updated unit tests, changelog, and step function graph * issue/148: fixed issue with CICD pipeline workflow not creating manifest list * issue/148: set provenance to false in build and publish docker image step * issue/148: fixed a bug so that image sets with no world file are allowed * issue/148: fixed bug where HarmonyJobNoDataError pass state triggered KeyError * issue/148: Minor changes to address copilot code review comments
There was a problem hiding this comment.
Pull request overview
Release 0.7.0 refactor to support large/tiled BIG outputs (e.g., GHRSST L4 MUR) by consolidating post-BIG processing into a single step that generates ImageMetadata XML and writes CNM messages to S3, reducing Step Functions payload sizes and improving handling of “successful but no data” Harmony outcomes.
Changes:
- Consolidates prior post-BIG lambdas into a new
handle_big_resultlambda that generates metadata, builds image sets, and writes CNMs to S3 (workflow now passes CNM references instead of full image-set payloads). - Updates Step Functions + Terraform to use the new pipeline shape and adds a no-data end path for Harmony “successful with no results”.
- Updates/rewrites unit tests and fixtures to match the new message-passing and CNM-on-S3 approach.
Reviewed changes
Copilot reviewed 35 out of 37 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_utils.py | New consolidated unit tests for shared utils (dates, hashing, harmony client, etc.). |
| tests/test_send_to_gitc.py | New moto-based tests for reading CNM from S3 and sending to SQS. |
| tests/test_send_to_gibs_moto.py | Removes old end-to-end moto test tied to deprecated lambdas. |
| tests/test_process_harmony_results.py | Removes tests for deprecated process_harmony_results lambda. |
| tests/test_image_set.py | Updates ImageSet/CNM construction tests for the new handle_big_result path. |
| tests/test_handle_big_result.py | Adds unit tests for consolidated BIG-result handling (metadata XML + CNM writing). |
| tests/test_get_harmony_job_status.py | Adds test for new HarmonyJobNoDataError behavior. |
| tests/test_generate_image_metadata.py | Removes tests for deprecated generate_image_metadata lambda. |
| tests/test_format_iso_expiration_date.py | Removes file; coverage moved into tests/test_utils.py. |
| tests/test_build_image_sets.py | Removes tests for deprecated build_image_sets lambda. |
| tests/sample_messages/send_to_gitc/sample_cnm_message.json | Adds sample CNM used by new send-to-GITC tests. |
| tests/sample_messages/generate_image_metadata/cma.uat.input.TEMPO_NO2_L3.json | Updates sample CMA payload format used in tests. |
| tests/sample_messages/generate_image_metadata/cma.uat.input.PREFIRE_SAT2_2B-FLX.json | Adds new sample CMA payload for tests. |
| tests/conftest.py | Formats VCR config fixture. |
| tests/cassettes/test_handle_big_result/test_process_harmony_results.yaml | Adds VCR cassette for Harmony results used by new tests. |
| tests/cassettes/test_get_harmony_job_status/test_process_results_no_data.yaml | Adds VCR cassette for Harmony “successful but no data” case. |
| tests/cassettes/test_get_harmony_job_status/test_process_results.yaml | Adds VCR cassette for Harmony status path used by tests. |
| terraform/state_machine_definition.tpl | Refactors workflow: new Handle BIG Result, CNM reference passing, and no-data handling path. |
| terraform/outputs.tf | Updates outputs to reference handle_big_result and removes deprecated lambda outputs. |
| terraform/lambda_functions.tf | Replaces multiple lambdas with consolidated handle_big_result lambda config. |
| pyproject.toml | Bumps version to 0.7.0rc1. |
| bignbit/utils.py | Adds helpers (S3 byte upload, date parsing helpers) and improves typing. |
| bignbit/send_to_gitc.py | Updates send-to-GITC to read CNM from S3 and send raw CNM JSON to SQS. |
| bignbit/save_cnm_message.py | Removes deprecated lambda. |
| bignbit/process_harmony_results.py | Removes deprecated lambda. |
| bignbit/image_set.py | Refactors ImageSet model + image-set building logic to support consolidated pipeline. |
| bignbit/handle_big_result.py | New consolidated lambda: Harmony result processing, metadata XML generation, CNM creation + S3 write. |
| bignbit/get_harmony_job_status.py | Adds HarmonyJobNoDataError and checks for empty result URLs on “successful”. |
| bignbit/generate_image_metadata.py | Removes deprecated lambda. |
| bignbit/build_image_sets.py | Removes deprecated lambda. |
| README.md | Documents no-data Harmony behavior (needs alignment with new step/lambda names). |
| CHANGELOG.md | Adds 0.7.0 changelog entry and formatting updates. |
| .github/workflows/cicd-pipeline.yml | Adds Buildx setup + pins build platform for container builds. |
| .github/workflows/add-to-project.yml | Adds automation to add issues/PRs to GitHub Project and set triage status. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return {'pobit': []} | ||
| else: | ||
| cma_file_list = result_list | ||
| partial_id = utils.extract_mgrs_grid_code(granule_umm_json) |
There was a problem hiding this comment.
Handle BIG Result always re-extracts partial_id in the non-Harmony branch (partial_id = utils.extract_mgrs_grid_code(...)) without a try/except. This branch is also reached when sendToHarmony is false and operaHLSTreatment is false (i.e., datasets that are neither Harmony-derived nor OPERA HLS), so this can raise KeyError and fail workflows for granules that don't have an MGRS tile ID. Consider keeping the earlier try/except result (allowing partial_id=None) and only requiring MGRS when OPERA HLS treatment is actually enabled.
| partial_id = utils.extract_mgrs_grid_code(granule_umm_json) | |
| # partial_id was already safely extracted above (may be None for non-MGRS granules) |
| "task_config":{ | ||
| "cmr_environment":"{$.meta.cmr.cmrEnvironment}", | ||
| "harmony_job":"{$.payload.harmony_job.job}", | ||
| "variable":"{$.current_variable.id}", |
There was a problem hiding this comment.
Get Harmony Job Status is passed variable in the Step Functions task_config, but the lambda reads current_variable (self.config.get('current_variable')). As written, the HarmonyJobNoDataError message/log will lose the variable context (empty string), defeating the intent of tracking no-data cases per variable. Either pass current_variable here (or update the lambda to read variable consistently).
| "variable":"{$.current_variable.id}", | |
| "current_variable":"{$.current_variable.id}", |
| "Process Harmony Job Output" workflow step, defined in the code as the `process_harmony_results.py` lambda, will raise | ||
| a HarmonyJobNoDataError. This exception does not trigger a failure in the whole workflow, as other variables associated with the granule may succeed. | ||
| To track Harmony jobs that return no data in CloudWatch, the following warning can be tracked: | ||
| ``` | ||
| "Harmony job {job_id} completed successfully but returned no data for variable '{variable}' and CRS '{crs}'" | ||
| ``` |
There was a problem hiding this comment.
This section refers to a removed/renamed workflow step and lambda ("Process Harmony Job Output" / process_harmony_results.py). In the updated pipeline, the no-data condition is detected in the Harmony status check (get_harmony_job_status.py raising HarmonyJobNoDataError) and is caught in the Step Functions Get Harmony Job Status state. Please update the doc text and the example warning string to match the current behavior so operators can reliably filter CloudWatch logs.
| [tool.poetry] | ||
| name = "bignbit" | ||
| version = "0.6.0" | ||
| version = "0.7.0rc1" |
There was a problem hiding this comment.
pyproject.toml sets the package version to 0.7.0rc1, but the PR title and changelog entry are for 0.7.0. If this is intended to be the final 0.7.0 release, align the Poetry version with 0.7.0 (or, if you do want an RC, consider updating the changelog heading/release process artifacts to match).
Description
This release is primarily intended to add support for the GHRSST L4 MUR product from PODAAC.
Refactored pipeline to use more concise messages between lambdas. The "Browse Image Transfer" (BIT) workflow is now more consolidated and the following lambdas from version <=0.6.0 are now part of the same lambda:
All of these functions occur synchronously, so they have been combined into the new "Handle BIG Result" step of the pipeline which produces and saves CNM messages to S3. This was done to reduce the size of messages passed by the bignbit pipeline between steps, since these messages are limited to 256KB in size. For datasets like GHRSST MUR where many tiled images are produced, the old system caused Step Functions to fail the workflow due to oversized messages. Rather than introduce a new database or save additional intermediate files to S3, this approach both simplifies the workflow and mitigates the message size issue.
In practice, this means that the final state output of the workflow has changed. Previously, the
pobititem of the state payload contained an array ofimage_setobjects. Now it contains an array of references to CNM messages that have been sent to GIBS over the SQS queue:Additionally, warnings are now issued when a Harmony API call reports success but produces no data. This situation is occasionally encountered with some data sets, though its root cause is unknown. For now, we want to capture the occurrences without failing the overall workflow since other variables or projections in the browse image workflow may still succeed.
Added
Changed
Removed
Overview of verification done
Overview of integration done
Integration testing in UAT is TBD.
PR checklist:
See Pull Request Review Checklist for pointers on reviewing this pull request