Skip to content

[SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding#1417

Open
Kanishkavijay39 wants to merge 2 commits intomainfrom
feat/plg-onboarding-model
Open

[SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding#1417
Kanishkavijay39 wants to merge 2 commits intomainfrom
feat/plg-onboarding-model

Conversation

@Kanishkavijay39
Copy link
Contributor

@Kanishkavijay39 Kanishkavijay39 commented Mar 9, 2026

Wiki: https://wiki.corp.adobe.com/pages/viewpage.action?spaceKey=AEMSites&title=ASO+PLG+Onboarding+Backend+Design

Summary

  • Add new PlgOnboarding model and collection to spacecat-shared-data-access to track the self-service onboarding lifecycle for PLG ASO customers
  • Define schema with attributes: imsOrgId, domain, baseURL, status, siteId, organizationId, steps, error, botBlocker, waitlistReason, completedAt
  • Add indexes for querying by imsOrgId, imsOrgId+domain, status, and baseURL
  • Support statuses: IN_PROGRESS, ONBOARDED, ERROR, WAITING_FOR_IP_WHITELISTING, WAITLISTED
  • Register entity in the entity registry and export types
  • Add unit tests for model and collection

Related Issue

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

Related Issues

Thanks for contributing!

Kanishka added 2 commits March 9, 2026 22:47
Add model, collection, schema, and TypeScript types for the
PlgOnboarding entity that tracks self-service ASO onboarding lifecycle.

Indexes: byImsOrgId, byImsOrgIdAndDomain, byStatus, byBaseURL.
@Kanishkavijay39 Kanishkavijay39 changed the title [WIP]Feat/PLG onboarding model [SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding Mar 10, 2026
Copy link
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Kanishkavijay39,

Good work standing up the PlgOnboarding entity - the model follows the established patterns cleanly and the test coverage is solid. A few things to address before merging.

Strengths

  • Follows the established model/collection/schema/index pattern precisely (file structure, entity registry, barrel exports)
  • Good index strategy covering the key query patterns (by org, by org+domain, by status, by baseURL)
  • Schema-level enum enforcement on status, isObject validation on JSONB fields, isIsoDate on completedAt
  • Tests cover all getters/setters, constructor, and STATUSES constant
  • TypeScript types are thorough with getter/setter declarations and collection methods

Important (should fix before merge)

1. imsOrgId must be readOnly with format validation

plg-onboarding.schema.js - The imsOrgId is the tenant identity anchor but is currently mutable via setImsOrgId(). The Consumer model (consumer.schema.js) is the direct precedent and explicitly marks imsOrgId as readOnly: true with regex validation:

.addAttribute('imsOrgId', {
  type: 'string',
  required: true,
  readOnly: true,
  validate: (value) => /^[a-z0-9]{24}@AdobeOrg$/i.test(value),
})

Without readOnly, a caller can reassign an onboarding record to a different org via record.setImsOrgId('OTHER@AdobeOrg').save() - this is a tenant isolation gap. Also remove setImsOrgId from index.d.ts and update the test that exercises it.

2. STATUSES key/value mismatch - WAITING_FOR_IP_ALLOWLISTING vs WAITING_FOR_IP_WHITELISTING

plg-onboarding.model.js:30 - The key says ALLOWLISTING but the stored value says WHITELISTING:

WAITING_FOR_IP_ALLOWLISTING: 'WAITING_FOR_IP_WHITELISTING',

Every developer who references PlgOnboarding.STATUSES.WAITING_FOR_IP_ALLOWLISTING will expect the persisted value to match the key name. Anyone filtering by status string in SQL, logs, or Coralogix dashboards has to know the "real" value differs from the constant name. Since this is a new entity with no existing data, align both key and value to WAITING_FOR_IP_ALLOWLISTING (the Adobe inclusive language standard) and coordinate with mysticat-data-service PR #125 to update the Postgres enum value too.

3. Missing baseURL format validation

plg-onboarding.schema.js - Every other baseURL attribute in this repo validates with isValidUrl (site.schema.js, site-candidate.schema.js, import-job.schema.js, scrape-job.schema.js). This schema accepts any string. Since baseURL is also an index partition key, invalid values would corrupt the index. Add:

.addAttribute('baseURL', {
  type: 'string',
  required: true,
  validate: (value) => isValidUrl(value),
})

4. siteId and organizationId - consider addReference or at minimum add UUID validation

Every other entity that holds FK references to Site/Organization uses addReference('belongs_to', ...) which auto-generates UUID validation and navigational methods (getOrganization(), getSite()). Plain string attributes with no validation allow garbage values to be stored. At minimum add:

.addAttribute('siteId', {
  type: 'string',
  required: false,
  validate: (value) => !value || isValidUUID(value),
})

Note: using addReference would consume 2 of the 5 GSI slots (4 already used), so evaluate which indexes are essential if you go that route. If you intentionally keep these as plain attributes (because the onboarding record creates Site/Org rather than belonging to pre-existing ones), add a code comment explaining why.

5. TypeScript collection methods are incomplete

plg-onboarding/index.d.ts - The compound accessor methods generated by the indexes are missing from the type declarations:

  • allByImsOrgIdAndUpdatedAt / findByImsOrgIdAndUpdatedAt (from index 1)
  • allByBaseURLAndStatus / findByBaseURLAndStatus (from index 4)
  • allByStatusAndUpdatedAt / findByStatusAndUpdatedAt (from index 3)

TypeScript consumers calling these will get type errors even though the methods exist at runtime.

Minor (nice to have)

  • domain should be readOnly: It forms part of the unique identity (imsOrgId + domain index). Mutating it post-creation breaks lifecycle semantics.
  • domain has no format validation: A hostname regex or length cap would prevent garbage data in the index sort key.
  • Copyright year: Source files use Copyright 2025, test files use Copyright 2026. Use 2026 consistently for new code.
  • Consider type: 'map' with properties for steps and botBlocker since the shapes are known - provides schema documentation and prevents structural drift (see import-job.schema.js:118 for the initiatedBy pattern).
  • Consider a default for status: default: 'IN_PROGRESS' would reduce boilerplate in callers, matching how opportunity.schema.js uses default: 'NEW'.

Recommendations (non-blocking)

  • Add schema-level validation tests (invalid status rejected, completedAt rejects non-ISO, required fields enforced)
  • The allByStatus and allByBaseURL collection methods return cross-tenant results. If these are intended for internal/admin use only, add a code comment clarifying that. Access control enforcement belongs in the API layer but the data model should signal intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants