Skip to content

Enhance tenant replication process by adding default configuration fetching and improving backfill validation. #1625

Open
sumanvpacewisdom wants to merge 1 commit intoELEVATE-Project:developfrom
sumanvpacewisdom:tenantBackfill_script_fix
Open

Enhance tenant replication process by adding default configuration fetching and improving backfill validation. #1625
sumanvpacewisdom wants to merge 1 commit intoELEVATE-Project:developfrom
sumanvpacewisdom:tenantBackfill_script_fix

Conversation

@sumanvpacewisdom
Copy link
Copy Markdown
Collaborator

@sumanvpacewisdom sumanvpacewisdom commented Apr 7, 2026

Updated tenant service to include a method for fetching default tenant configuration and modified the tenant consumer to utilize this configuration during replication. Improved error handling in backfill script for missing tenant fields.

Release Notes

  • Added fetchDefaultTenantConfig() method to TenantService to load default tenant configuration from environment-driven source, including templates, forms, entity types, entities, questions, question sets, reports metadata, and role extensions
  • Enhanced tenant replication to accept pre-fetched default configuration, improving performance by reducing redundant database queries during backfill operations
  • Implemented configuration pre-clearing in replication process: existing tenant configuration rows are now deleted before copying to ensure clean re-runs
  • Modified backfill script validation to require all four fields (code, name, org_id, org_code) to be present in CSV, removing default environment variable fallbacks for organization fields
  • Updated tenant consumer to process backfill events in addition to newly created tenant events, rebuilding materialized views for both cases while limiting periodic refresh job scheduling to new tenants only
  • Improved error handling in backfill workflow to account for missing tenant fields and prevent invalid data replication
Author Files Changed Lines Added Lines Removed
sumanvpacewisdom 3 117 47

…tching and improving backfill validation. Updated tenant service to include a method for fetching default tenant configuration and modified the tenant consumer to utilize this configuration during replication. Improved error handling in backfill script for missing tenant fields.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

Walkthrough

Tenant backfilling workflow extended to support both newly created tenants and backfill messages. Default tenant configuration is now fetched once and passed through the Kafka consumer pipeline. CSV validation requirements strengthened, and configuration replication now deletes existing data before applying new configuration in a transaction.

Changes

Cohort / File(s) Summary
Kafka Consumer Workflow
src/generics/kafka/consumers/tenant.js
Expanded post-upsert handling to trigger for both created tenants and backfill messages. Updated service call to pass message.defaultConfig parameter. Materialized view rebuilds now occur for both scenarios; refresh job scheduling limited to newly created tenants only.
Backfill Script
src/scripts/backfillTenantData.js
Added import and single fetch of defaultConfig via TenantService.fetchDefaultTenantConfig() before iterating tenants. Stricter CSV validation now requires code, name, org_id, and org_code fields. Payload construction no longer defaults org_id and org_code from environment variables.
Tenant Service
src/services/tenant.js
Added fetchDefaultTenantConfig() static method for loading default tenant metadata. Updated replicateConfigFromDefaultTenant() signature to accept optional pre-fetched defaultConfig parameter. Introduced pre-clear deletion logic (DELETE statements per config table) before replication, and switched from per-resource queries to in-memory replication from loaded config arrays.

Sequence Diagram(s)

sequenceDiagram
    participant Script as Backfill Script
    participant Service as Tenant Service
    participant Queue as Kafka Queue
    participant Consumer as Kafka Consumer
    participant DB as Database

    Script->>Service: fetchDefaultTenantConfig()
    Service->>DB: Query default tenant config<br/>(templates, forms, entity types, etc.)
    DB-->>Service: config object
    Service-->>Script: defaultConfig
    
    Script->>Script: Iterate CSV rows
    Script->>Queue: Publish message with<br/>defaultConfig + tenant data
    
    Consumer->>DB: Find/Create tenant
    alt Tenant Created or Backfill
        Consumer->>Service: replicateConfigFromDefaultTenant<br/>(code, orgId, orgCode, defaultConfig)
        Service->>DB: DELETE FROM config tables<br/>WHERE tenant_code = :tenantCode
        Service->>DB: INSERT config data from<br/>pre-fetched defaultConfig
        Service->>DB: Rebuild materialized views
    end
    
    alt New Tenant Only
        Service->>DB: Schedule periodic refresh job
    end
    
    DB-->>Consumer: Success
    Consumer-->>Script: Acknowledgement
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • nevil-mathew

Poem

🐰 A config once fetched, reused with care,
Backfill messages dance through the air,
Delete then rebuild—a fresh, clean slate,
Org codes and names, validation's great,
Through Kafka streams, configuration flows,
Tenant data grows!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding default configuration fetching and improving backfill validation, which align with the three files modified and the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/tenant.js (1)

199-204: ⚠️ Potential issue | 🟠 Major

fetchExisting queries wrong organization_code range; will not find inserted entity types when org_code in message differs from DEFAULT_ORGANISATION_CODE env var.

The transform (line 150) writes entity types with organization_code: resolvedOrgCode (which is newOrgCode || defaultOrgCode), but fetchExisting (line 201) queries only for [defaultOrgCode]. When the Kafka message contains an org_code that differs from DEFAULT_ORGANISATION_CODE, the inserted records with organization_code = org_code won't be found during the fallback query. This causes replicateWithIdMap to attempt re-insertion on the next run, potentially duplicating records.

Per the design intent (configuration data should retain the default org's code), update the baseTransform to use defaultOrgCode instead of resolvedOrgCode, or update fetchExisting to query [resolvedOrgCode]. The matchKey comparison will also need to match the selected organization code.

Suggested fix (preserve default org code intent)
-				organization_code: resolvedOrgCode,
+				organization_code: defaultOrgCode,

and

-				fetchExisting: () => EntityTypeQueries.findAllEntityTypes([defaultOrgCode], newTenantCode, null),
-				matchKey: (oldET) => (c) => c.value === oldET.value && c.organization_code === oldET.organization_code,
+				fetchExisting: () => EntityTypeQueries.findAllEntityTypes([defaultOrgCode], newTenantCode, null),
+				matchKey: (oldET) => (c) => c.value === oldET.value,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/tenant.js` around lines 199 - 204, The bug is that transform is
writing entity types with the resolved org code but fetchExisting only looks up
[defaultOrgCode], causing misses and duplicate inserts; to fix, make the
transform produced by baseTransform use defaultOrgCode (not
resolvedOrgCode/newOrgCode) so stored records use the default org code, ensure
EntityTypeQueries.findAllEntityTypes is still called with [defaultOrgCode], and
update the matchKey comparator in replicateWithIdMap (the matchKey closure that
compares c.organization_code) to compare against defaultOrgCode so existing rows
are correctly matched.
🧹 Nitpick comments (2)
src/scripts/backfillTenantData.js (1)

67-68: Consider deferring fetchDefaultTenantConfig() when --dry-run is specified.

Currently, the default config is fetched unconditionally before processing tenants. In dry-run mode, this call is unnecessary and will fail if DEFAULT_TENANT_CODE / DEFAULT_ORGANISATION_CODE environment variables are not configured, preventing users from validating their CSV file structure without a full environment setup.

♻️ Suggested improvement
 async function backfillTenants(tenants, options = {}) {
 	const { dryRun = false } = options
 	let success = 0
 	let failed = 0

-	const defaultConfig = await TenantService.fetchDefaultTenantConfig()
+	let defaultConfig = null
+	if (!dryRun) {
+		defaultConfig = await TenantService.fetchDefaultTenantConfig()
+	}

 	for (const tenant of tenants) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/backfillTenantData.js` around lines 67 - 68, The code
unconditionally calls TenantService.fetchDefaultTenantConfig() into
defaultConfig before tenant processing, which breaks --dry-run; change the logic
to defer or skip calling fetchDefaultTenantConfig when dry-run is active (detect
the existing dryRun flag or the presence of "--dry-run" in process.argv) and
only fetch defaultConfig when actually performing writes; update any code paths
that assume defaultConfig exists to handle it being undefined in dry-run mode
(refer to TenantService.fetchDefaultTenantConfig, defaultConfig, and the dry-run
flag/variable).
src/services/tenant.js (1)

156-175: Clear-before-replicate pattern ensures idempotent reruns.

The ordered DELETE statements correctly handle foreign key dependencies (e.g., entities before entity_types). This approach guarantees a clean slate for backfill operations.

One consideration: using raw SQL with hardcoded table names couples this code to the database schema. If table names change or soft deletes are introduced, this could silently break.

💡 Optional: Use model-based destroy for consistency

Consider using Sequelize model destroy methods (e.g., EntityModel.destroy({ where: { tenant_code: newTenantCode }, transaction })) to:

  • Respect soft-delete behavior if added later
  • Get automatic table name resolution from models
  • Trigger any model hooks if needed

However, the current raw SQL approach is acceptable for performance-critical bulk operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/tenant.js` around lines 156 - 175, The current loop using
configTables and db.sequelize.query deletes rows by table name and tenant_code
(using newTenantCode and transaction) which can break if table names or
soft-delete behavior change; replace these raw DELETEs with the corresponding
Sequelize model destroy calls (e.g., EntityModel.destroy({ where: { tenant_code:
newTenantCode }, transaction })) for each table represented in configTables so
model table names, soft-deletes, and hooks are honored (or keep raw SQL but add
a clear comment explaining intentional bypass of model behavior and ensure table
names stay in sync).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/services/tenant.js`:
- Around line 199-204: The bug is that transform is writing entity types with
the resolved org code but fetchExisting only looks up [defaultOrgCode], causing
misses and duplicate inserts; to fix, make the transform produced by
baseTransform use defaultOrgCode (not resolvedOrgCode/newOrgCode) so stored
records use the default org code, ensure EntityTypeQueries.findAllEntityTypes is
still called with [defaultOrgCode], and update the matchKey comparator in
replicateWithIdMap (the matchKey closure that compares c.organization_code) to
compare against defaultOrgCode so existing rows are correctly matched.

---

Nitpick comments:
In `@src/scripts/backfillTenantData.js`:
- Around line 67-68: The code unconditionally calls
TenantService.fetchDefaultTenantConfig() into defaultConfig before tenant
processing, which breaks --dry-run; change the logic to defer or skip calling
fetchDefaultTenantConfig when dry-run is active (detect the existing dryRun flag
or the presence of "--dry-run" in process.argv) and only fetch defaultConfig
when actually performing writes; update any code paths that assume defaultConfig
exists to handle it being undefined in dry-run mode (refer to
TenantService.fetchDefaultTenantConfig, defaultConfig, and the dry-run
flag/variable).

In `@src/services/tenant.js`:
- Around line 156-175: The current loop using configTables and
db.sequelize.query deletes rows by table name and tenant_code (using
newTenantCode and transaction) which can break if table names or soft-delete
behavior change; replace these raw DELETEs with the corresponding Sequelize
model destroy calls (e.g., EntityModel.destroy({ where: { tenant_code:
newTenantCode }, transaction })) for each table represented in configTables so
model table names, soft-deletes, and hooks are honored (or keep raw SQL but add
a clear comment explaining intentional bypass of model behavior and ensure table
names stay in sync).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fda3e40f-7899-4987-84e7-370870851545

📥 Commits

Reviewing files that changed from the base of the PR and between fc39a6f and f08a6a4.

📒 Files selected for processing (3)
  • src/generics/kafka/consumers/tenant.js
  • src/scripts/backfillTenantData.js
  • src/services/tenant.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant