add code and documentation to grant schemas to groups#50
add code and documentation to grant schemas to groups#50
Conversation
lhoesly
commented
Feb 25, 2026
- Python code to manage schema-based / group-based grants for data controls
There was a problem hiding this comment.
Pull request overview
This PR introduces a Python automation script and documentation for managing Redshift schema-based access grants to groups. The script automates granting SELECT permissions (and optionally USAGE) on database schemas to specified groups, including both existing and future tables/views. This addresses the complexity of Redshift's hierarchical permission model and the limitation that new tables don't automatically inherit schema-level permissions.
Changes:
- New Python script (
automate_schema_grants.py) that generates and executes SQL GRANT statements based on a configuration mapping schemas to groups - Comprehensive README documentation explaining the purpose, configuration, and usage of the schema grants automation
- Support for configurable options via environment variables (dry run mode, USAGE grants, future table grants)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| python/grant_schemas/automate_schema_grants.py | Main automation script that generates SQL GRANT statements for schemas and groups, with support for existing and future tables |
| python/grant_schemas/README.md | Comprehensive documentation covering overview, configuration, usage examples, and Redshift-specific considerations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if schema and groups: | ||
| mapping[schema] = { | ||
| "schema": schema, | ||
| "groups": tuple(groups), |
There was a problem hiding this comment.
Converting groups from a list to a tuple is unnecessary here. The groups are only used for iteration in the SQL generation and converting to a tuple doesn't provide any functional benefit. This could be simplified by keeping groups as a list, which would be more consistent with the input format and the table_creators field which remains a list.
| "groups": tuple(groups), | |
| "groups": groups, |
There was a problem hiding this comment.
I feel like 4 lines to say drop the tuple is a bit much
|
|
||
| def main(database, dry_run=True, grant_usage=False, grant_future=True): | ||
| grant_commands = [] | ||
| schema_grants = get_schema_grants_config() |
There was a problem hiding this comment.
The script does not validate that SCHEMA_GRANTS_CONFIG contains at least one valid configuration entry before proceeding. If the configuration is empty or all entries are invalid (missing schema or groups), the script will generate an empty query string and potentially execute it, which could be confusing. Consider adding a check after get_schema_grants_config() to ensure there is at least one valid schema configuration, and raise a clear error if not.
| schema_grants = get_schema_grants_config() | |
| schema_grants = get_schema_grants_config() | |
| if not schema_grants: | |
| LOG.error( | |
| "No valid schema grant configurations found. " | |
| "SCHEMA_GRANTS_CONFIG must contain at least one entry with a " | |
| "'schema_name' and a non-empty 'groups' list." | |
| ) | |
| raise ValueError( | |
| "Invalid configuration: SCHEMA_GRANTS_CONFIG must contain at least " | |
| "one schema with associated groups." | |
| ) |
There was a problem hiding this comment.
Not a bad idea but probably not necessary
| groups = schema_grants[schema_name]["groups"] | ||
|
|
||
| if grant_usage: | ||
| usage_command = f"GRANT USAGE ON SCHEMA {schema} TO GROUP {', GROUP '.join(groups)};" |
There was a problem hiding this comment.
SQL injection vulnerability: The schema and group names are directly interpolated into SQL commands without validation or sanitization. Malicious or malformed schema/group names could execute arbitrary SQL. Add input validation to ensure schema and group names only contain alphanumeric characters, underscores, and optionally hyphens. Use a whitelist pattern like ^[a-zA-Z0-9_]+$ to validate these identifiers before constructing SQL statements.
|
|
||
| # Grant on existing tables and views | ||
| # Note: In Redshift, "ALL TABLES" includes tables, views, and external tables | ||
| select_command = f"GRANT SELECT ON ALL TABLES IN SCHEMA {schema} TO GROUP {', GROUP '.join(groups)};" |
There was a problem hiding this comment.
SQL injection vulnerability: The schema and group names are directly interpolated into SQL commands without validation or sanitization. Malicious or malformed schema/group names could execute arbitrary SQL. Add input validation to ensure schema and group names only contain alphanumeric characters, underscores, and optionally hyphens.
| # Grant for each specified table creator | ||
| for creator in table_creators: | ||
| for group in groups: | ||
| future_command = f"ALTER DEFAULT PRIVILEGES FOR USER {creator} IN SCHEMA {schema} GRANT SELECT ON TABLES TO GROUP {group};" |
There was a problem hiding this comment.
SQL injection vulnerability: The creator usernames and schema/group names are directly interpolated into SQL commands without validation or sanitization. Malicious usernames could execute arbitrary SQL. Add input validation to ensure usernames, schema names, and group names only contain alphanumeric characters, underscores, and optionally hyphens.
| future = civis.io.query_civis(query, database=database, hidden=False) | ||
| LOG.info(future.result()) |
There was a problem hiding this comment.
No error handling for the query execution. If the SQL execution fails (due to insufficient permissions, invalid schema names, or database connectivity issues), the script will crash without providing actionable feedback. Consider wrapping the query execution in a try-except block and logging specific error information to help users troubleshoot issues.
| future = civis.io.query_civis(query, database=database, hidden=False) | |
| LOG.info(future.result()) | |
| try: | |
| future = civis.io.query_civis(query, database=database, hidden=False) | |
| result = future.result() | |
| LOG.info(result) | |
| except Exception as exc: | |
| LOG.error( | |
| "Failed to execute grant commands on database '%s': %s", | |
| database, | |
| exc, | |
| exc_info=True, | |
| ) | |
| raise |
| mapping[schema] = { | ||
| "schema": schema, |
There was a problem hiding this comment.
The 'schema' field in the returned dictionary duplicates the 'schema_name' key. Since the dictionary is keyed by schema name, storing 'schema': schema_name is redundant. Consider removing this field and using the dictionary key directly when accessing the schema name.
| if __name__ == "__main__": | ||
| # Different Platform/cloud environments use slightly different formats for Boolean parameters; | ||
| # This provides some assurance that "truthy" values are assigned properly. | ||
| DRY_RUN_PARAM = strtobool(str(os.environ.get('DRY_RUN', 'True'))) | ||
| GRANT_USAGE = strtobool(str(os.environ.get('GRANT_USAGE', 'False'))) | ||
| GRANT_FUTURE = strtobool(str(os.environ.get('GRANT_FUTURE', 'True'))) |
There was a problem hiding this comment.
The strtobool function will raise a ValueError if the environment variable contains an invalid boolean string (e.g., 'invalid', 'maybe'). This error will be unhelpful to users who may not know what valid boolean strings are. Consider wrapping these conversions in a try-except block with a clearer error message that indicates which environment variable has an invalid value and what valid values are (e.g., 'true', 'false', '1', '0', 'yes', 'no').
| if __name__ == "__main__": | |
| # Different Platform/cloud environments use slightly different formats for Boolean parameters; | |
| # This provides some assurance that "truthy" values are assigned properly. | |
| DRY_RUN_PARAM = strtobool(str(os.environ.get('DRY_RUN', 'True'))) | |
| GRANT_USAGE = strtobool(str(os.environ.get('GRANT_USAGE', 'False'))) | |
| GRANT_FUTURE = strtobool(str(os.environ.get('GRANT_FUTURE', 'True'))) | |
| def _get_bool_env(var_name, default): | |
| """ | |
| Read a boolean environment variable using distutils.util.strtobool and provide | |
| a clear error message if the value is invalid. | |
| """ | |
| raw_value = os.environ.get(var_name, default) | |
| try: | |
| # strtobool returns 0 or 1; convert to a proper bool. | |
| return bool(strtobool(str(raw_value))) | |
| except ValueError as exc: | |
| raise ValueError( | |
| f"Invalid value for environment variable {var_name}: {raw_value!r}. " | |
| "Expected one of 'y', 'n', 'yes', 'no', 't', 'f', 'true', 'false', " | |
| "'on', 'off', '1', or '0'." | |
| ) from exc | |
| if __name__ == "__main__": | |
| # Different Platform/cloud environments use slightly different formats for Boolean parameters; | |
| # This provides some assurance that "truthy" values are assigned properly. | |
| DRY_RUN_PARAM = _get_bool_env('DRY_RUN', 'True') | |
| GRANT_USAGE = _get_bool_env('GRANT_USAGE', 'False') | |
| GRANT_FUTURE = _get_bool_env('GRANT_FUTURE', 'True') |
| # No table_creators specified - grant for the current user running the script | ||
| # This will only apply to tables created by this user! | ||
| for group in groups: | ||
| future_command = f"ALTER DEFAULT PRIVILEGES IN SCHEMA {schema} GRANT SELECT ON TABLES TO GROUP {group};" |
There was a problem hiding this comment.
SQL injection vulnerability: The schema and group names are directly interpolated into SQL commands without validation or sanitization. Add input validation to ensure schema names and group names only contain alphanumeric characters, underscores, and optionally hyphens.
| from distutils.util import strtobool | ||
|
|
There was a problem hiding this comment.
The distutils module is deprecated and will be removed in Python 3.12. Consider using a more modern alternative such as checking if the string is in a list of accepted true values (e.g., ['true', '1', 'yes']), or using a library like python-dotenv that handles environment variable parsing.
| from distutils.util import strtobool | |
| def strtobool(val): | |
| """ | |
| Convert a string representation of truth to True or False. | |
| Accepted true values are: 'y', 'yes', 't', 'true', 'on', '1'. | |
| Accepted false values are: 'n', 'no', 'f', 'false', 'off', '0'. | |
| Raises ValueError if 'val' is anything else. | |
| """ | |
| val = str(val).strip().lower() | |
| if val in ("y", "yes", "t", "true", "on", "1"): | |
| return True | |
| if val in ("n", "no", "f", "false", "off", "0"): | |
| return False | |
| raise ValueError(f"invalid truth value {val!r}") |
mayamkay
left a comment
There was a problem hiding this comment.
I think some of the copilot suggestions are reasonable but I don't think any of them are totally necessary so if you'd rather just merge as is I'm fine with that.