-
Notifications
You must be signed in to change notification settings - Fork 1
add code and documentation to grant schemas to groups #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| # Schema-Based Granting | ||
|
|
||
| ## Overview | ||
|
|
||
| Many developed databases hold thousands of different tables and views, which makes organization and discovery of data difficult and greatly complicates questions of data sharing, granting, and access. | ||
|
|
||
| One remedy is to organize tables into schemas that align with different audiences and use cases, and then make and enforce sharing decisions accordingly. Even when there are exceptions, it is a significant improvement to think of dozens of schemas instead of thousands of individual tables. | ||
|
|
||
| However, schemas are a relatively under-supported and unintuitive database feature for sharing decisions, for a few main reasons: | ||
|
|
||
| - Schema and table permissions work together hierarchically in Redshift - users need **both** USAGE permission on the schema **and** SELECT permission on the tables. Granting one without the other is insufficient, making permission management more complex than simple folder-based sharing | ||
| - Schema purposes cannot always be clearly intuited from their names, requiring the use of thorough external documentation for users to know where certain kinds of data should go | ||
| - Newly created data tables inherit no permissions from their schema and are only accessible to the table owner and superusers by default, regardless of what groups have access to the schema itself | ||
|
|
||
| ## Purpose | ||
|
|
||
| This script automates database sharing to work more like shared folders in collaborative file sharing software, such that: | ||
|
|
||
| 1. Individuals primarily have access to different topic schemas based on their **group** membership, with minimal cases of person-level exceptions | ||
| 2. Tables put into a schema are understood to be made **automatically** available to members of those groups (not a Redshift/SQL default behavior) | ||
| 3. Behavior is documented and explained directly in context, so that users are not surprised by cases where: | ||
| - Data is **not** shared with other users as expected, or | ||
| - Data **is** shared with other users when it was **not** expected (data leakage) | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### Step 1: Set Environment Variables | ||
|
|
||
| ```bash | ||
| export DATABASE="your_database_name" # Required: The name of your database | ||
| export DRY_RUN="True" # Optional: Set to False to execute changes | ||
| export GRANT_USAGE="False" # Optional: Set to True to also grant USAGE on schemas | ||
| export GRANT_FUTURE="True" # Optional: Set to False to skip future table grants (default: True) | ||
| ``` | ||
|
|
||
| ### Step 2: Configure Schema Grants | ||
|
|
||
| Edit the `SCHEMA_GRANTS_CONFIG` list in `automate_schema_grants.py` to define which groups should have access to which schemas: | ||
|
|
||
| ```python | ||
| SCHEMA_GRANTS_CONFIG = [ | ||
| { | ||
| 'schema_name': 'reporting', | ||
| 'groups': ['analysts', 'managers', 'executives'], | ||
| 'table_creators': ['etl_user', 'data_engineer_bot'], # Users who create tables | ||
| 'notes': 'Reporting tables for business intelligence' | ||
| }, | ||
| { | ||
| 'schema_name': 'raw_data', | ||
| 'groups': ['data_engineers', 'etl_users'], | ||
| 'table_creators': ['etl_service_account'], | ||
| 'notes': 'Raw data ingestion schema' | ||
| }, | ||
| ] | ||
| ``` | ||
|
|
||
| **IMPORTANT - `table_creators` Configuration:** | ||
|
|
||
| In Redshift, `ALTER DEFAULT PRIVILEGES` only applies to objects created by **specific users**. You must list all users who might create tables in the `table_creators` field. Common users to include: | ||
| - Service accounts (e.g., `etl_service_account`, `airflow_user`) | ||
| - Application users that create tables | ||
| - Data engineers or analysts with CREATE privileges | ||
|
|
||
| If you omit `table_creators`, default privileges will only apply to tables created by the user running this script, meaning tables created by other users won't automatically inherit the correct permissions. | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Dry Run (Preview Changes) | ||
|
|
||
| ```bash | ||
| python automate_schema_grants.py | ||
| ``` | ||
|
|
||
| This will log the SQL GRANT statements that would be executed without actually making any changes. | ||
|
|
||
| ### Execute Changes | ||
|
|
||
| ```bash | ||
| export DRY_RUN="False" | ||
| python automate_schema_grants.py | ||
| ``` | ||
|
|
||
| This will execute the SQL statements to grant permissions. | ||
|
|
||
| ### Grant USAGE Permissions | ||
|
|
||
| By default, the script only grants SELECT permissions on tables. To also grant USAGE permissions on the schemas themselves: | ||
|
|
||
| ```bash | ||
| export GRANT_USAGE="True" | ||
| python automate_schema_grants.py | ||
| ``` | ||
|
|
||
| ### Disable Future Object Grants | ||
|
|
||
| By default, the script grants permissions on both existing and future tables/views. To only grant on existing objects: | ||
|
|
||
| ```bash | ||
| export GRANT_FUTURE="False" | ||
| python automate_schema_grants.py | ||
| ``` | ||
|
|
||
| ## Tables vs Views | ||
|
|
||
| In Redshift, the command `GRANT SELECT ON ALL TABLES IN SCHEMA` covers: | ||
| - Regular tables | ||
| - Views | ||
| - External tables | ||
| - Late-binding views | ||
|
|
||
| Similarly, `ALTER DEFAULT PRIVILEGES` applies to both tables and views created in the future. This means you don't need separate commands for views - they're automatically included. | ||
|
|
||
| ## How It Works | ||
|
|
||
| 1. **Reads Configuration**: Loads the schema-to-groups mapping from `SCHEMA_GRANTS_CONFIG` | ||
| 2. **Generates GRANT Statements**: Creates SQL statements to grant SELECT (and optionally USAGE) permissions | ||
| - Grants SELECT on all existing tables and views in each schema | ||
| - Optionally grants USAGE on the schema itself (required for accessing tables/views) | ||
| - Sets default privileges for future tables and views created by specified users | ||
| 3. **Executes or Logs**: Either executes the changes (when `DRY_RUN=False`) or logs them for review | ||
|
|
||
| **Note**: In Redshift, "ALL TABLES" includes tables, views, and external tables that currently exist in the schema. | ||
|
|
||
| **Critical Limitation**: The `ALTER DEFAULT PRIVILEGES` command only applies to objects created by specific users. The script uses `FOR USER <username>` to grant privileges on future objects created by each user in the `table_creators` list. If a user not in this list creates a table, the permissions will **not** be automatically applied, and you'll need to either: | ||
| - Re-run this script to grant on the newly created tables | ||
| - Add that user to the `table_creators` list and re-run the script | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Python 3.x | ||
| - `civis` Python package | ||
| - Superuser/admin access to the target database | ||
| - Appropriate Civis Platform API credentials |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,167 @@ | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
| This script automates granting SELECT access (and optionally USAGE) on | ||||||||||||||||||||||||||||||
| database schemas to specified groups. | ||||||||||||||||||||||||||||||
| For each configured schema, it grants permissions to all associated groups | ||||||||||||||||||||||||||||||
| on all tables and views within that schema. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| This script must be run with authorized superuser account credential on the | ||||||||||||||||||||||||||||||
| affected database. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Configuration: | ||||||||||||||||||||||||||||||
| - Set the DATABASE environment variable to specify which database to use | ||||||||||||||||||||||||||||||
| - Edit the SCHEMA_GRANTS_CONFIG list below to map schemas to their authorized | ||||||||||||||||||||||||||||||
| groups | ||||||||||||||||||||||||||||||
| - Set DRY_RUN=True to preview changes without executing them | ||||||||||||||||||||||||||||||
| - Set GRANT_USAGE=True to also grant USAGE permissions on schemas | ||||||||||||||||||||||||||||||
| - Set GRANT_FUTURE=True to grant permissions on future tables/views | ||||||||||||||||||||||||||||||
| (default: True) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| IMPORTANT: ALTER DEFAULT PRIVILEGES in Redshift only applies to objects | ||||||||||||||||||||||||||||||
| created by specific users. You must specify the 'table_creators' list | ||||||||||||||||||||||||||||||
| for each schema to include all users who might create tables. | ||||||||||||||||||||||||||||||
| Otherwise, default privileges will only apply to tables created | ||||||||||||||||||||||||||||||
| by the user running this script. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Note: In Redshift, "ALL TABLES" includes tables, views, and external tables. | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| import civis | ||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||
| from distutils.util import strtobool | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # Setting up logging | ||||||||||||||||||||||||||||||
| LOG = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||
| FORMAT = "%(asctime)-15s %(levelname)s:%(name)s.%(funcName)s:%(lineno)s %(message)s" | ||||||||||||||||||||||||||||||
| logging.basicConfig(level=logging.INFO, format=FORMAT) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # ======================================== | ||||||||||||||||||||||||||||||
| # CONFIGURATION: Edit this list for your specific use case | ||||||||||||||||||||||||||||||
| # ======================================== | ||||||||||||||||||||||||||||||
| SCHEMA_GRANTS_CONFIG = [ | ||||||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||||||
| "schema_name": "example_schema", | ||||||||||||||||||||||||||||||
| "groups": ["example_group_1", "example_group_2", "read_only_users"], | ||||||||||||||||||||||||||||||
| "table_creators": [], # Optional: usernames who can create tables in this schema | ||||||||||||||||||||||||||||||
| "notes": "Example schema - replace with your actual schemas and groups", | ||||||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||||||
| # Add more schema-to-groups mappings here as needed | ||||||||||||||||||||||||||||||
| # { | ||||||||||||||||||||||||||||||
| # 'schema_name': 'analytics_schema', | ||||||||||||||||||||||||||||||
| # 'groups': ['analysts', 'data_engineers', 'reporting_users'], | ||||||||||||||||||||||||||||||
| # 'table_creators': ['etl_user', 'data_engineer_bot'], # Users who create tables | ||||||||||||||||||||||||||||||
| # 'notes': 'Analytics schema for reporting team' | ||||||||||||||||||||||||||||||
| # }, | ||||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| def get_schema_grants_config(): | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
| Returns the schema grants configuration from the code-based SCHEMA_GRANTS_CONFIG. | ||||||||||||||||||||||||||||||
| Returns a dict mapping schema names to their authorized groups: | ||||||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||||||
| 'schema_name': { | ||||||||||||||||||||||||||||||
| 'schema': 'schema_name', | ||||||||||||||||||||||||||||||
| 'groups': ['group1', 'group2', ...], | ||||||||||||||||||||||||||||||
| 'table_creators': ['user1', 'user2', ...] | ||||||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||||||
| ... | ||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
| mapping = {} | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| for config in SCHEMA_GRANTS_CONFIG: | ||||||||||||||||||||||||||||||
| schema = config.get("schema_name") | ||||||||||||||||||||||||||||||
| groups = config.get("groups", []) | ||||||||||||||||||||||||||||||
| table_creators = config.get("table_creators", []) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| if schema and groups: | ||||||||||||||||||||||||||||||
| mapping[schema] = { | ||||||||||||||||||||||||||||||
| "schema": schema, | ||||||||||||||||||||||||||||||
|
Comment on lines
+79
to
+80
|
||||||||||||||||||||||||||||||
| "groups": tuple(groups), | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| "groups": tuple(groups), | |
| "groups": groups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like 4 lines to say drop the tuple is a bit much
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script does not validate that SCHEMA_GRANTS_CONFIG contains at least one valid configuration entry before proceeding. If the configuration is empty or all entries are invalid (missing schema or groups), the script will generate an empty query string and potentially execute it, which could be confusing. Consider adding a check after get_schema_grants_config() to ensure there is at least one valid schema configuration, and raise a clear error if not.
| schema_grants = get_schema_grants_config() | |
| schema_grants = get_schema_grants_config() | |
| if not schema_grants: | |
| LOG.error( | |
| "No valid schema grant configurations found. " | |
| "SCHEMA_GRANTS_CONFIG must contain at least one entry with a " | |
| "'schema_name' and a non-empty 'groups' list." | |
| ) | |
| raise ValueError( | |
| "Invalid configuration: SCHEMA_GRANTS_CONFIG must contain at least " | |
| "one schema with associated groups." | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea but probably not necessary
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error handling for the query execution. If the SQL execution fails (due to insufficient permissions, invalid schema names, or database connectivity issues), the script will crash without providing actionable feedback. Consider wrapping the query execution in a try-except block and logging specific error information to help users troubleshoot issues.
| future = civis.io.query_civis(query, database=database, hidden=False) | |
| LOG.info(future.result()) | |
| try: | |
| future = civis.io.query_civis(query, database=database, hidden=False) | |
| result = future.result() | |
| LOG.info(result) | |
| except Exception as exc: | |
| LOG.error( | |
| "Failed to execute grant commands on database '%s': %s", | |
| database, | |
| exc, | |
| exc_info=True, | |
| ) | |
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distutils module is deprecated and will be removed in Python 3.12. Consider using a more modern alternative such as checking if the string is in a list of accepted true values (e.g., ['true', '1', 'yes']), or using a library like python-dotenv that handles environment variable parsing.