Skip to content

add code to harmonize redshift and platform groups#49

Open
lhoesly wants to merge 1 commit intomainfrom
harmonize-groups
Open

add code to harmonize redshift and platform groups#49
lhoesly wants to merge 1 commit intomainfrom
harmonize-groups

Conversation

@lhoesly
Copy link
Contributor

@lhoesly lhoesly commented Feb 25, 2026

  • Python code to align groups in Civis Platform with groups in redshift
  • user supplies config of crosswalk of groups
  • user supplies database

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds Python automation to synchronize user group memberships between Civis Platform and Redshift databases. The solution treats Platform groups as the source of truth and updates Redshift group memberships to match, addressing the challenge of maintaining consistent group memberships across two independent systems.

Changes:

  • New Python script that fetches groups from Platform API and Redshift, compares memberships, and generates SQL ALTER GROUP statements to synchronize them
  • Comprehensive README with configuration instructions and usage examples
  • Configurable group crosswalk mapping and dry-run mode for safe testing

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 22 comments.

File Description
python/platform_redshift_groups/automate_groups.py Main automation script that queries Platform and Redshift, compares group memberships, and generates/executes SQL to synchronize groups
python/platform_redshift_groups/README.md Documentation covering overview, configuration, usage instructions, and requirements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

redshift_valid_users_list = create_valid_database_users_list(database=database)
# Edit this list to specify users that should be ignored during group synchronization
ignore_users_list = ["dbadmin", "console"]
# ignore_groups_list =
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete code with commented-out line. Either remove the comment or implement the intended ignore_groups_list functionality if it's needed for the feature.

Suggested change
# ignore_groups_list =

Copilot uses AI. Check for mistakes.

for platform_group, redshift_group in group_names_crosswalk:
platform_group_members = platform_group_members_dict[platform_group]
platform_group_members = [x for x in platform_group_members if x in redshift_valid_users_list]
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignore_users_list is only applied to Redshift group members (line 166) but not to Platform group members (line 163). This creates an asymmetry: if an ignored user like "dbadmin" is in a Platform group but not in the corresponding Redshift group, the script will try to add them. Consider applying the ignore list to both Platform and Redshift members for consistency.

Suggested change
platform_group_members = [x for x in platform_group_members if x in redshift_valid_users_list]
platform_group_members = [
x for x in platform_group_members
if x in redshift_valid_users_list and x not in ignore_users_list
]

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +128
group_results = [client.groups.get(result['id']) for result in client.groups.list(limit = 1000)]

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded limit of 1000 groups may be insufficient for large organizations. If an organization has more than 1000 groups, this will silently fail to process all groups. Consider implementing pagination or at least logging a warning if the limit is reached.

Suggested change
group_results = [client.groups.get(result['id']) for result in client.groups.list(limit = 1000)]
# Retrieve all groups via pagination instead of relying on a single hardcoded limit.
limit = 1000
page_num = 1
all_group_summaries = []
last_page_count = 0
while True:
page = client.groups.list(page_num=page_num, limit=limit)
if not page:
break
all_group_summaries.extend(page)
last_page_count = len(page)
if last_page_count < limit:
# We've reached the final page.
break
page_num += 1
if last_page_count == limit:
LOG.warning(
"Retrieved a full page of %d groups from the Platform API on page %d. "
"There may be additional groups that were not retrieved. "
"Consider increasing the limit or using a more robust pagination strategy "
"if your organization has more than %d groups.",
limit,
page_num,
len(all_group_summaries),
)
group_results = [client.groups.get(result['id']) for result in all_group_summaries]

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +212
def main(database, dry_run = True):
"""
Main function that produces and executes the SQL query to align the redshift and platform groups
"""

# constants and empty lists for the loop
full_add_list = []
full_drop_list = []
full_platform_group_change_list = []
full_query_text = ""

# creating the dictionaries and lists we need
group_names_crosswalk = create_group_names_crosswalk()
platform_group_members_dict = create_platform_groups_dictionary()
redshift_group_members_dict = create_redshift_groups_dictionary(database=database)

redshift_valid_users_list = create_valid_database_users_list(database=database)
# Edit this list to specify users that should be ignored during group synchronization
ignore_users_list = ["dbadmin", "console"]
# ignore_groups_list =

for platform_group, redshift_group in group_names_crosswalk:
platform_group_members = platform_group_members_dict[platform_group]
platform_group_members = [x for x in platform_group_members if x in redshift_valid_users_list]

redshift_group_members = redshift_group_members_dict[redshift_group]
redshift_group_members = [x for x in redshift_group_members if x not in ignore_users_list]

to_add_to_redshift = set(platform_group_members) - set(redshift_group_members)
to_drop_from_redshift = set(redshift_group_members) - set(platform_group_members)

full_add_list.append((platform_group, to_add_to_redshift))
full_drop_list.append((platform_group, to_drop_from_redshift))

if to_add_to_redshift:
add_query = f"\nALTER GROUP {redshift_group} ADD USER {', '.join(to_add_to_redshift)};"
else:
add_query = f"\n--No users to add to {redshift_group}"

if to_drop_from_redshift:
drop_query = f"\nALTER GROUP {redshift_group} DROP USER {', '.join(to_drop_from_redshift)};"
else:
drop_query = f"\n--No users to drop from {redshift_group}"

full_query_text = full_query_text + f"\n\n--Platform group name: {platform_group}" + \
f"\n--Corresponding Redshift group name: {redshift_group}" + \
f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"

LOG.info(f"Full Platform Group change list: {full_platform_group_change_list}")
LOG.info(f"Full add list: {full_add_list}")
LOG.info(f"Full drop list: {full_drop_list}")

if dry_run:
LOG.info("Running in dry run mode. The following SQL generated but not executed:\n\n")
LOG.info(full_query_text)
else:
LOG.info("Running in full mode. The following SQL will be executed:\n\n")
LOG.info(full_query_text)
future = civis.io.query_civis(full_query_text, database = database, hidden = False)
LOG.info(future.result())



if __name__ == "__main__":
# Different Platform/cloud environments use slightly different formats for Boolean parameters;
# This provides some assurance that "truthy" values are assigned properly.
DRY_RUN_PARAM = strtobool(str(os.environ.get('DRY_RUN', 'True')))
DATABASE = os.environ.get('DATABASE')

if not DATABASE:
raise ValueError("DATABASE environment variable must be set")

main(database=DATABASE, dry_run=DRY_RUN_PARAM) No newline at end of file
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for API and database failures. The script makes multiple API calls and database queries without try-except blocks. If any of these operations fail (network issues, authentication errors, database connection problems), the script will crash with an unclear error message. Add appropriate error handling with informative error messages for better operational reliability.

Copilot uses AI. Check for mistakes.
return platform_group_members


def main(database, dry_run = True):
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing around default parameter. The function definition has a space before the equals sign in dry_run = True. PEP 8 recommends no spaces around the equals sign for default parameter values. Should be dry_run=True.

Suggested change
def main(database, dry_run = True):
def main(database, dry_run=True):

Copilot uses AI. Check for mistakes.
else:
LOG.info("Running in full mode. The following SQL will be executed:\n\n")
LOG.info(full_query_text)
future = civis.io.query_civis(full_query_text, database = database, hidden = False)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing around keyword argument. The function call has a space before the equals sign in database = database. PEP 8 recommends no spaces around the equals sign for keyword arguments. Should be database=database.

Copilot uses AI. Check for mistakes.

full_query_text = full_query_text + f"\n\n--Platform group name: {platform_group}" + \
f"\n--Corresponding Redshift group name: {redshift_group}" + \
f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading comment labels in the generated SQL. Lines 186 say "Users to add:" and "Users to drop:" but these are followed by the actual SQL queries (or comments if there's nothing to do), not lists of users. Consider changing the labels to something like "Add query:" and "Drop query:" to better reflect what follows.

Suggested change
f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"
f"\n--Add query: {add_query}" + f"\n--Drop query: {drop_query}"

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +62
```bash
python automate_groups.py
```
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage instructions show running the script as python automate_groups.py but don't specify the correct path. Users would need to either be in the python/platform_redshift_groups/ directory or use the full path python python/platform_redshift_groups/automate_groups.py. Consider clarifying the working directory assumption or providing the full command.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
from distutils.util import strtobool

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distutils.util module is deprecated since Python 3.10 and was removed in Python 3.12. Replace strtobool with a custom implementation or use a different approach to parse boolean environment variables.

Suggested change
from distutils.util import strtobool
def strtobool(val):
"""
Convert a string representation of truth to True or False.
This mirrors the behavior of distutils.util.strtobool, returning
True for 'y', 'yes', 't', 'true', 'on', '1' and False for
'n', 'no', 'f', 'false', 'off', '0' (case-insensitive).
Raises ValueError if the value is not a recognized boolean string.
"""
val = str(val).strip().lower()
if val in ("y", "yes", "t", "true", "on", "1"):
return True
if val in ("n", "no", "f", "false", "off", "0"):
return False
raise ValueError(f"invalid truth value {val!r}")

Copilot uses AI. Check for mistakes.
# constants and empty lists for the loop
full_add_list = []
full_drop_list = []
full_platform_group_change_list = []
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable. The variable full_platform_group_change_list is initialized but never populated or used meaningfully. Remove it or implement the intended functionality.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants