Skip to content

add code and documentation to grant schemas to groups#50

Open
lhoesly wants to merge 3 commits intomainfrom
schema_grants
Open

add code and documentation to grant schemas to groups#50
lhoesly wants to merge 3 commits intomainfrom
schema_grants

Conversation

@lhoesly
Copy link
Contributor

@lhoesly lhoesly commented Feb 25, 2026

  • Python code to manage schema-based / group-based grants for data controls

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Python automation script and documentation for managing Redshift schema-based access grants to groups. The script automates granting SELECT permissions (and optionally USAGE) on database schemas to specified groups, including both existing and future tables/views. This addresses the complexity of Redshift's hierarchical permission model and the limitation that new tables don't automatically inherit schema-level permissions.

Changes:

  • New Python script (automate_schema_grants.py) that generates and executes SQL GRANT statements based on a configuration mapping schemas to groups
  • Comprehensive README documentation explaining the purpose, configuration, and usage of the schema grants automation
  • Support for configurable options via environment variables (dry run mode, USAGE grants, future table grants)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
python/grant_schemas/automate_schema_grants.py Main automation script that generates SQL GRANT statements for schemas and groups, with support for existing and future tables
python/grant_schemas/README.md Comprehensive documentation covering overview, configuration, usage examples, and Redshift-specific considerations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if schema and groups:
mapping[schema] = {
"schema": schema,
"groups": tuple(groups),
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting groups from a list to a tuple is unnecessary here. The groups are only used for iteration in the SQL generation and converting to a tuple doesn't provide any functional benefit. This could be simplified by keeping groups as a list, which would be more consistent with the input format and the table_creators field which remains a list.

Suggested change
"groups": tuple(groups),
"groups": groups,

Copilot uses AI. Check for mistakes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like 4 lines to say drop the tuple is a bit much


def main(database, dry_run=True, grant_usage=False, grant_future=True):
grant_commands = []
schema_grants = get_schema_grants_config()
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script does not validate that SCHEMA_GRANTS_CONFIG contains at least one valid configuration entry before proceeding. If the configuration is empty or all entries are invalid (missing schema or groups), the script will generate an empty query string and potentially execute it, which could be confusing. Consider adding a check after get_schema_grants_config() to ensure there is at least one valid schema configuration, and raise a clear error if not.

Suggested change
schema_grants = get_schema_grants_config()
schema_grants = get_schema_grants_config()
if not schema_grants:
LOG.error(
"No valid schema grant configurations found. "
"SCHEMA_GRANTS_CONFIG must contain at least one entry with a "
"'schema_name' and a non-empty 'groups' list."
)
raise ValueError(
"Invalid configuration: SCHEMA_GRANTS_CONFIG must contain at least "
"one schema with associated groups."
)

Copilot uses AI. Check for mistakes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea but probably not necessary

groups = schema_grants[schema_name]["groups"]

if grant_usage:
usage_command = f"GRANT USAGE ON SCHEMA {schema} TO GROUP {', GROUP '.join(groups)};"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: The schema and group names are directly interpolated into SQL commands without validation or sanitization. Malicious or malformed schema/group names could execute arbitrary SQL. Add input validation to ensure schema and group names only contain alphanumeric characters, underscores, and optionally hyphens. Use a whitelist pattern like ^[a-zA-Z0-9_]+$ to validate these identifiers before constructing SQL statements.

Copilot uses AI. Check for mistakes.

# Grant on existing tables and views
# Note: In Redshift, "ALL TABLES" includes tables, views, and external tables
select_command = f"GRANT SELECT ON ALL TABLES IN SCHEMA {schema} TO GROUP {', GROUP '.join(groups)};"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: The schema and group names are directly interpolated into SQL commands without validation or sanitization. Malicious or malformed schema/group names could execute arbitrary SQL. Add input validation to ensure schema and group names only contain alphanumeric characters, underscores, and optionally hyphens.

Copilot uses AI. Check for mistakes.
# Grant for each specified table creator
for creator in table_creators:
for group in groups:
future_command = f"ALTER DEFAULT PRIVILEGES FOR USER {creator} IN SCHEMA {schema} GRANT SELECT ON TABLES TO GROUP {group};"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: The creator usernames and schema/group names are directly interpolated into SQL commands without validation or sanitization. Malicious usernames could execute arbitrary SQL. Add input validation to ensure usernames, schema names, and group names only contain alphanumeric characters, underscores, and optionally hyphens.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +128
future = civis.io.query_civis(query, database=database, hidden=False)
LOG.info(future.result())
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error handling for the query execution. If the SQL execution fails (due to insufficient permissions, invalid schema names, or database connectivity issues), the script will crash without providing actionable feedback. Consider wrapping the query execution in a try-except block and logging specific error information to help users troubleshoot issues.

Suggested change
future = civis.io.query_civis(query, database=database, hidden=False)
LOG.info(future.result())
try:
future = civis.io.query_civis(query, database=database, hidden=False)
result = future.result()
LOG.info(result)
except Exception as exc:
LOG.error(
"Failed to execute grant commands on database '%s': %s",
database,
exc,
exc_info=True,
)
raise

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +73
mapping[schema] = {
"schema": schema,
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'schema' field in the returned dictionary duplicates the 'schema_name' key. Since the dictionary is keyed by schema name, storing 'schema': schema_name is redundant. Consider removing this field and using the dictionary key directly when accessing the schema name.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +136
if __name__ == "__main__":
# Different Platform/cloud environments use slightly different formats for Boolean parameters;
# This provides some assurance that "truthy" values are assigned properly.
DRY_RUN_PARAM = strtobool(str(os.environ.get('DRY_RUN', 'True')))
GRANT_USAGE = strtobool(str(os.environ.get('GRANT_USAGE', 'False')))
GRANT_FUTURE = strtobool(str(os.environ.get('GRANT_FUTURE', 'True')))
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strtobool function will raise a ValueError if the environment variable contains an invalid boolean string (e.g., 'invalid', 'maybe'). This error will be unhelpful to users who may not know what valid boolean strings are. Consider wrapping these conversions in a try-except block with a clearer error message that indicates which environment variable has an invalid value and what valid values are (e.g., 'true', 'false', '1', '0', 'yes', 'no').

Suggested change
if __name__ == "__main__":
# Different Platform/cloud environments use slightly different formats for Boolean parameters;
# This provides some assurance that "truthy" values are assigned properly.
DRY_RUN_PARAM = strtobool(str(os.environ.get('DRY_RUN', 'True')))
GRANT_USAGE = strtobool(str(os.environ.get('GRANT_USAGE', 'False')))
GRANT_FUTURE = strtobool(str(os.environ.get('GRANT_FUTURE', 'True')))
def _get_bool_env(var_name, default):
"""
Read a boolean environment variable using distutils.util.strtobool and provide
a clear error message if the value is invalid.
"""
raw_value = os.environ.get(var_name, default)
try:
# strtobool returns 0 or 1; convert to a proper bool.
return bool(strtobool(str(raw_value)))
except ValueError as exc:
raise ValueError(
f"Invalid value for environment variable {var_name}: {raw_value!r}. "
"Expected one of 'y', 'n', 'yes', 'no', 't', 'f', 'true', 'false', "
"'on', 'off', '1', or '0'."
) from exc
if __name__ == "__main__":
# Different Platform/cloud environments use slightly different formats for Boolean parameters;
# This provides some assurance that "truthy" values are assigned properly.
DRY_RUN_PARAM = _get_bool_env('DRY_RUN', 'True')
GRANT_USAGE = _get_bool_env('GRANT_USAGE', 'False')
GRANT_FUTURE = _get_bool_env('GRANT_FUTURE', 'True')

Copilot uses AI. Check for mistakes.
# No table_creators specified - grant for the current user running the script
# This will only apply to tables created by this user!
for group in groups:
future_command = f"ALTER DEFAULT PRIVILEGES IN SCHEMA {schema} GRANT SELECT ON TABLES TO GROUP {group};"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: The schema and group names are directly interpolated into SQL commands without validation or sanitization. Add input validation to ensure schema names and group names only contain alphanumeric characters, underscores, and optionally hyphens.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +25
from distutils.util import strtobool

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distutils module is deprecated and will be removed in Python 3.12. Consider using a more modern alternative such as checking if the string is in a list of accepted true values (e.g., ['true', '1', 'yes']), or using a library like python-dotenv that handles environment variable parsing.

Suggested change
from distutils.util import strtobool
def strtobool(val):
"""
Convert a string representation of truth to True or False.
Accepted true values are: 'y', 'yes', 't', 'true', 'on', '1'.
Accepted false values are: 'n', 'no', 'f', 'false', 'off', '0'.
Raises ValueError if 'val' is anything else.
"""
val = str(val).strip().lower()
if val in ("y", "yes", "t", "true", "on", "1"):
return True
if val in ("n", "no", "f", "false", "off", "0"):
return False
raise ValueError(f"invalid truth value {val!r}")

Copilot uses AI. Check for mistakes.
Copy link

@mayamkay mayamkay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the copilot suggestions are reasonable but I don't think any of them are totally necessary so if you'd rather just merge as is I'm fine with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants