add code to harmonize redshift and platform groups by lhoesly · Pull Request #49 · civisanalytics/platform-code-examples

lhoesly · 2026-02-25T17:41:58Z

Python code to align groups in Civis Platform with groups in redshift
user supplies config of crosswalk of groups
user supplies database

Copilot

Pull request overview

This pull request adds Python automation to synchronize user group memberships between Civis Platform and Redshift databases. The solution treats Platform groups as the source of truth and updates Redshift group memberships to match, addressing the challenge of maintaining consistent group memberships across two independent systems.

Changes:

New Python script that fetches groups from Platform API and Redshift, compares memberships, and generates SQL ALTER GROUP statements to synchronize them
Comprehensive README with configuration instructions and usage examples
Configurable group crosswalk mapping and dry-run mode for safe testing

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 22 comments.

File	Description
python/platform_redshift_groups/automate_groups.py	Main automation script that queries Platform and Redshift, compares group memberships, and generates/executes SQL to synchronize groups
python/platform_redshift_groups/README.md	Documentation covering overview, configuration, usage instructions, and requirements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T17:49:59Z

python/platform_redshift_groups/automate_groups.py

+    redshift_valid_users_list = create_valid_database_users_list(database=database)
+    # Edit this list to specify users that should be ignored during group synchronization
+    ignore_users_list = ["dbadmin", "console"]
+    # ignore_groups_list =


Incomplete code with commented-out line. Either remove the comment or implement the intended ignore_groups_list functionality if it's needed for the feature.

Suggested change

# ignore_groups_list =

Copilot · 2026-02-25T17:50:00Z

python/platform_redshift_groups/automate_groups.py

+
+    for platform_group, redshift_group in group_names_crosswalk:
+        platform_group_members = platform_group_members_dict[platform_group]
+        platform_group_members = [x for x in platform_group_members if x in redshift_valid_users_list]


The ignore_users_list is only applied to Redshift group members (line 166) but not to Platform group members (line 163). This creates an asymmetry: if an ignored user like "dbadmin" is in a Platform group but not in the corresponding Redshift group, the script will try to add them. Consider applying the ignore list to both Platform and Redshift members for consistency.

Suggested change

platform_group_members = [x for x in platform_group_members if x in redshift_valid_users_list]

platform_group_members = [

x for x in platform_group_members

if x in redshift_valid_users_list and x not in ignore_users_list

]

Copilot · 2026-02-25T17:50:00Z

python/platform_redshift_groups/automate_groups.py

+    group_results = [client.groups.get(result['id']) for result in client.groups.list(limit = 1000)]
+


Hardcoded limit of 1000 groups may be insufficient for large organizations. If an organization has more than 1000 groups, this will silently fail to process all groups. Consider implementing pagination or at least logging a warning if the limit is reached.

Suggested change

group_results = [client.groups.get(result['id']) for result in client.groups.list(limit = 1000)]

# Retrieve all groups via pagination instead of relying on a single hardcoded limit.

limit = 1000

page_num = 1

all_group_summaries = []

last_page_count = 0

while True:

page = client.groups.list(page_num=page_num, limit=limit)

if not page:

break

all_group_summaries.extend(page)

last_page_count = len(page)

if last_page_count < limit:

# We've reached the final page.

break

page_num += 1

if last_page_count == limit:

LOG.warning(

"Retrieved a full page of %d groups from the Platform API on page %d. "

"There may be additional groups that were not retrieved. "

"Consider increasing the limit or using a more robust pagination strategy "

"if your organization has more than %d groups.",

limit,

page_num,

len(all_group_summaries),

)

group_results = [client.groups.get(result['id']) for result in all_group_summaries]

Copilot · 2026-02-25T17:50:00Z

python/platform_redshift_groups/automate_groups.py

+def main(database, dry_run = True):
+    """
+    Main function that produces and executes the SQL query to align the redshift and platform groups
+    """
+
+    # constants and empty lists for the loop
+    full_add_list = []
+    full_drop_list = []
+    full_platform_group_change_list = []
+    full_query_text = ""
+
+    # creating the dictionaries and lists we need
+    group_names_crosswalk = create_group_names_crosswalk()
+    platform_group_members_dict = create_platform_groups_dictionary()
+    redshift_group_members_dict = create_redshift_groups_dictionary(database=database)
+
+    redshift_valid_users_list = create_valid_database_users_list(database=database)
+    # Edit this list to specify users that should be ignored during group synchronization
+    ignore_users_list = ["dbadmin", "console"]
+    # ignore_groups_list =
+
+    for platform_group, redshift_group in group_names_crosswalk:
+        platform_group_members = platform_group_members_dict[platform_group]
+        platform_group_members = [x for x in platform_group_members if x in redshift_valid_users_list]
+
+        redshift_group_members = redshift_group_members_dict[redshift_group]
+        redshift_group_members = [x for x in redshift_group_members if x not in ignore_users_list]
+
+        to_add_to_redshift = set(platform_group_members) - set(redshift_group_members)
+        to_drop_from_redshift = set(redshift_group_members) - set(platform_group_members)
+
+        full_add_list.append((platform_group, to_add_to_redshift))
+        full_drop_list.append((platform_group, to_drop_from_redshift))
+
+        if to_add_to_redshift:
+            add_query = f"\nALTER GROUP {redshift_group} ADD USER {', '.join(to_add_to_redshift)};"
+        else:
+            add_query = f"\n--No users to add to {redshift_group}"
+
+        if to_drop_from_redshift:
+            drop_query = f"\nALTER GROUP {redshift_group} DROP USER {', '.join(to_drop_from_redshift)};"
+        else:
+            drop_query = f"\n--No users to drop from {redshift_group}"
+
+        full_query_text = full_query_text + f"\n\n--Platform group name: {platform_group}" + \
+            f"\n--Corresponding Redshift group name: {redshift_group}" + \
+            f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"
+
+    LOG.info(f"Full Platform Group change list: {full_platform_group_change_list}")
+    LOG.info(f"Full add list: {full_add_list}")
+    LOG.info(f"Full drop list: {full_drop_list}")
+
+    if dry_run:
+        LOG.info("Running in dry run mode. The following SQL generated but not executed:\n\n")
+        LOG.info(full_query_text)
+    else:
+        LOG.info("Running in full mode. The following SQL will be executed:\n\n")
+        LOG.info(full_query_text)
+        future = civis.io.query_civis(full_query_text, database = database, hidden = False)
+        LOG.info(future.result())
+
+
+
+if __name__ == "__main__":
+    # Different Platform/cloud environments use slightly different formats for Boolean parameters;
+    # This provides some assurance that "truthy" values are assigned properly.
+    DRY_RUN_PARAM = strtobool(str(os.environ.get('DRY_RUN', 'True')))
+    DATABASE = os.environ.get('DATABASE')
+
+    if not DATABASE:
+        raise ValueError("DATABASE environment variable must be set")
+
+    main(database=DATABASE, dry_run=DRY_RUN_PARAM)


Missing error handling for API and database failures. The script makes multiple API calls and database queries without try-except blocks. If any of these operations fail (network issues, authentication errors, database connection problems), the script will crash with an unclear error message. Add appropriate error handling with informative error messages for better operational reliability.

Copilot · 2026-02-25T17:50:00Z

python/platform_redshift_groups/automate_groups.py

+    return platform_group_members
+
+
+def main(database, dry_run = True):


Inconsistent spacing around default parameter. The function definition has a space before the equals sign in dry_run = True. PEP 8 recommends no spaces around the equals sign for default parameter values. Should be dry_run=True.

Suggested change

def main(database, dry_run = True):

def main(database, dry_run=True):

Copilot · 2026-02-25T17:50:03Z

python/platform_redshift_groups/automate_groups.py

+    else:
+        LOG.info("Running in full mode. The following SQL will be executed:\n\n")
+        LOG.info(full_query_text)
+        future = civis.io.query_civis(full_query_text, database = database, hidden = False)


Inconsistent spacing around keyword argument. The function call has a space before the equals sign in database = database. PEP 8 recommends no spaces around the equals sign for keyword arguments. Should be database=database.

Copilot · 2026-02-25T17:50:04Z

python/platform_redshift_groups/automate_groups.py

+
+        full_query_text = full_query_text + f"\n\n--Platform group name: {platform_group}" + \
+            f"\n--Corresponding Redshift group name: {redshift_group}" + \
+            f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"


Misleading comment labels in the generated SQL. Lines 186 say "Users to add:" and "Users to drop:" but these are followed by the actual SQL queries (or comments if there's nothing to do), not lists of users. Consider changing the labels to something like "Add query:" and "Drop query:" to better reflect what follows.

Suggested change

f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"

f"\n--Add query: {add_query}" + f"\n--Drop query: {drop_query}"

Copilot · 2026-02-25T17:50:04Z

python/platform_redshift_groups/README.md

+```bash
+python automate_groups.py
+```


The usage instructions show running the script as python automate_groups.py but don't specify the correct path. Users would need to either be in the python/platform_redshift_groups/ directory or use the full path python python/platform_redshift_groups/automate_groups.py. Consider clarifying the working directory assumption or providing the full command.

Copilot · 2026-02-25T17:50:04Z

python/platform_redshift_groups/automate_groups.py

+from distutils.util import strtobool
+


The distutils.util module is deprecated since Python 3.10 and was removed in Python 3.12. Replace strtobool with a custom implementation or use a different approach to parse boolean environment variables.

Suggested change

from distutils.util import strtobool

def strtobool(val):

"""

Convert a string representation of truth to True or False.

This mirrors the behavior of distutils.util.strtobool, returning

True for 'y', 'yes', 't', 'true', 'on', '1' and False for

'n', 'no', 'f', 'false', 'off', '0' (case-insensitive).

Raises ValueError if the value is not a recognized boolean string.

"""

val = str(val).strip().lower()

if val in ("y", "yes", "t", "true", "on", "1"):

return True

if val in ("n", "no", "f", "false", "off", "0"):

return False

raise ValueError(f"invalid truth value {val!r}")

Copilot · 2026-02-25T17:50:04Z

python/platform_redshift_groups/automate_groups.py

+    # constants and empty lists for the loop
+    full_add_list = []
+    full_drop_list = []
+    full_platform_group_change_list = []


Unused variable. The variable full_platform_group_change_list is initialized but never populated or used meaningfully. Remove it or implement the intended functionality.

add code to harmonize redshift and platform groups

54b8cfe

lhoesly requested a review from Copilot February 25, 2026 17:42

Copilot started reviewing on behalf of lhoesly February 25, 2026 17:42 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add code to harmonize redshift and platform groups#49

add code to harmonize redshift and platform groups#49
lhoesly wants to merge 1 commit intomainfrom
harmonize-groups

lhoesly commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		group_results = [client.groups.get(result['id']) for result in client.groups.list(limit = 1000)]

-    group_results = [client.groups.get(result['id']) for result in client.groups.list(limit = 1000)]
+    # Retrieve all groups via pagination instead of relying on a single hardcoded limit.
+    limit = 1000
+    page_num = 1
+    all_group_summaries = []
+    last_page_count = 0
+    while True:
+        page = client.groups.list(page_num=page_num, limit=limit)
+        if not page:
+            break
+        all_group_summaries.extend(page)
+        last_page_count = len(page)
+        if last_page_count < limit:
+            # We've reached the final page.
+            break
+        page_num += 1
+    if last_page_count == limit:
+        LOG.warning(
+            "Retrieved a full page of %d groups from the Platform API on page %d. "
+            "There may be additional groups that were not retrieved. "
+            "Consider increasing the limit or using a more robust pagination strategy "
+            "if your organization has more than %d groups.",
+            limit,
+            page_num,
+            len(all_group_summaries),
+        )
+    group_results = [client.groups.get(result['id']) for result in all_group_summaries]

		return platform_group_members


		def main(database, dry_run = True):

	def main(database, dry_run = True):
	def main(database, dry_run=True):

	f"\n--Users to add: {add_query}" + f"\n--Users to drop: {drop_query}"
	f"\n--Add query: {add_query}" + f"\n--Drop query: {drop_query}"

-from distutils.util import strtobool
+def strtobool(val):
+    """
+    Convert a string representation of truth to True or False.
+    This mirrors the behavior of distutils.util.strtobool, returning
+    True for 'y', 'yes', 't', 'true', 'on', '1' and False for
+    'n', 'no', 'f', 'false', 'off', '0' (case-insensitive).
+    Raises ValueError if the value is not a recognized boolean string.
+    """
+    val = str(val).strip().lower()
+    if val in ("y", "yes", "t", "true", "on", "1"):
+        return True
+    if val in ("n", "no", "f", "false", "off", "0"):
+        return False
+    raise ValueError(f"invalid truth value {val!r}")

Conversation

lhoesly commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants