Conversation
There was a problem hiding this comment.
Pull request overview
Adds a reusable/template “generic CSV upload” script for Civis Platform, intended to let clients upload data while controlling destination schema/table and notifying users on completion.
Changes:
- Introduces
generic_upload.pyto determine a user’s target schema from a metadata table, upload a CSV into a target table, and optionally send a completion email. - Adds
python/custom_file_uploads/README.mddocumenting setup (metadata table + Platform configuration) and usage.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| python/custom_file_uploads/generic_upload.py | New upload script: schema lookup, table drop/recreate + CSV import, email notification flow |
| python/custom_file_uploads/README.md | New documentation describing required Platform setup and configuration variables |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| full_table = f"{schema}.{table_name}" | ||
| LOG.info(f"Target table: {full_table}") |
There was a problem hiding this comment.
full_table is built directly from schema (read from a table) and table_name (user-controlled parameter) and then interpolated into SQL (DROP/CREATE/INSERT paths). This allows SQL injection or accidental writes/drops outside the intended target if either value contains characters like ./;/quotes. Validate both as safe SQL identifiers (e.g., strict regex + disallow dots) and/or map dropdown values to a fixed allowlist of table names instead of using raw input.
|
|
||
|
|
There was a problem hiding this comment.
download_data_create_table catches upload exceptions, logs an error, and then returns without re-raising. This means main() will still send a success email and log "Upload process completed successfully" even when the upload failed. Re-raise the exception (or return a success flag that main() checks) so failures stop the workflow and don't trigger success notifications.
| raise |
| database = os.environ["DATABASE"] | ||
| metadata_table = os.environ["METADATA_TABLE"] | ||
| email_address = os.getenv("EMAIL") | ||
| testing = int(os.getenv("TESTING", 0)) == 1 |
There was a problem hiding this comment.
testing = int(os.getenv("TESTING", 0)) == 1 only works for numeric values; if the Platform boolean parameter or env var is set to "true"/"false" (a common pattern elsewhere in this repo), this will crash. Parse booleans more defensively (e.g., accept 1/0 and true/false strings).
| ```bash | ||
| cd /app; | ||
| export DATABASE='redshift-general' | ||
| export TESTING=0 | ||
| export EMAIL_RECIPIENTS="" | ||
| export METADATA_TABLE="metadata_data_upload_mmk" | ||
| export EMAIL_SCRIPT_ID="340695845" | ||
| python python/custom_file_uploads/generic_upload.py | ||
| ``` |
There was a problem hiding this comment.
The example container config exports EMAIL_RECIPIENTS, but the script reads EMAIL for the recipient address and EMAIL_SCRIPT_ID for the notification script. Update the example to use the same env var names the script actually consumes, otherwise users will configure this and never receive emails.
python/custom_file_uploads/README.md
Outdated
| - **FILE**: File parameter (required) - Users will upload their CSV file here | ||
| - **TABLE_NAME**: Dropdown or text parameter (required) - The name of the target table | ||
| - **EMAIL**: Text parameter (optional) - Email address for notification | ||
| - **TESTING**: Boolean parameter (optional) - Set to true to skip sending emails |
There was a problem hiding this comment.
The README says TESTING is a boolean parameter, but the script currently expects TESTING to be numeric (0/1) when parsing the environment. Either update the README to specify 0/1, or update the script to accept "true"/"false" strings to match the documented behavior.
| - **TESTING**: Boolean parameter (optional) - Set to true to skip sending emails | |
| - **TESTING**: Integer parameter (optional) - Set to `1` to skip sending emails, `0` to send emails |
| import civis | ||
| import pandas as pd | ||
|
|
||
| LOG = civis.loggers.civis_logger() | ||
|
|
||
|
|
||
| def get_schema(metadata_table: str, database: str, client=None) -> str: |
There was a problem hiding this comment.
get_schema is annotated as returning str, but it returns a 2-tuple (schema, user_email). Update the return type annotation (and ideally the docstring) to reflect the actual return type to avoid misleading callers/static type checks.
| import civis | |
| import pandas as pd | |
| LOG = civis.loggers.civis_logger() | |
| def get_schema(metadata_table: str, database: str, client=None) -> str: | |
| from typing import Tuple | |
| import civis | |
| import pandas as pd | |
| LOG = civis.loggers.civis_logger() | |
| def get_schema(metadata_table: str, database: str, client=None) -> Tuple[str, str]: | |
| """Return the user's schema name and email address. | |
| Returns | |
| ------- | |
| Tuple[str, str] | |
| A 2-tuple of (schema_name, user_email). | |
| """ |
| LOG.info(f"Downloading file: {file_obj['name']}") | ||
|
|
||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| tmp_path = os.path.join(tmpdir, file_obj["name"]) |
There was a problem hiding this comment.
tmp_path is built using file_obj["name"] directly. Since file names can be user-controlled, it's safer to sanitize it (e.g., os.path.basename(...)) to avoid unexpected path separators or traversal-like behavior when writing to disk.
| tmp_path = os.path.join(tmpdir, file_obj["name"]) | |
| safe_name = os.path.basename(file_obj["name"]) | |
| tmp_path = os.path.join(tmpdir, safe_name) |
| # Use a blank script on platform to trigger email notification | ||
| # NOTE: This requires an existing script ID that you can configure | ||
| # to send success notification emails | ||
| email_script_id = int(os.getenv("EMAIL_SCRIPT_ID")) |
There was a problem hiding this comment.
email_script_id = int(os.getenv("EMAIL_SCRIPT_ID")) will raise a non-obvious TypeError if the env var is missing/empty. Prefer reading via os.environ[...] (so the KeyError names the variable) or add an explicit check that raises a clear ValueError explaining how to configure EMAIL_SCRIPT_ID.
| email_script_id = int(os.getenv("EMAIL_SCRIPT_ID")) | |
| email_script_id_str = os.getenv("EMAIL_SCRIPT_ID") | |
| if not email_script_id_str: | |
| raise ValueError( | |
| "EMAIL_SCRIPT_ID environment variable must be set to a valid " | |
| "Civis Platform script ID in order to send notification emails." | |
| ) | |
| try: | |
| email_script_id = int(email_script_id_str) | |
| except (TypeError, ValueError): | |
| raise ValueError( | |
| f"EMAIL_SCRIPT_ID must be an integer Civis Platform script ID, " | |
| f"got {email_script_id_str!r}." | |
| ) |
| client.scripts.patch_python3( | ||
| id=email_script_id, | ||
| name=f"Upload notification for {user_email}", | ||
| notifications={ | ||
| "success_email_subject": email_subject, | ||
| "success_email_body": email_body, | ||
| "success_email_addresses": [recipient_email], | ||
| }, | ||
| ) | ||
| civis.utils.run_job(email_script_id, client=client).result() |
There was a problem hiding this comment.
Patching a shared Platform script's notifications (client.scripts.patch_python3(...)) right before running it is race-prone: concurrent uploads can overwrite each other's notification settings, potentially emailing the wrong recipient/body. Consider a per-run notification mechanism that doesn't mutate shared script state (e.g., separate notification scripts per tenant, or avoid patching and instead configure notifications on the upload script itself).
python/custom_file_uploads/README.md
Outdated
| - **Email not received**: Check that the SCRIPT_ID points to a valid script and TESTING is set to 0 | ||
|
|
There was a problem hiding this comment.
Troubleshooting references SCRIPT_ID, but the script and earlier setup steps use EMAIL_SCRIPT_ID. Rename this reference to match the actual configuration variable to avoid confusion.
This PR adds code that documents how to make a generic data upload template script. This is helpful if you want to allow clients to upload data but you wish to control for where the data goes, how it's handled etc.
The code was adapted from the PMI
pmipipelinetoolsrepo.Example script is available here https://platform.civisanalytics.com/spa/#/scripts/containers/340695409