Skip to content

feat: add query layer (CM-1059)#3942

Open
ulemons wants to merge 30 commits intomainfrom
feat/add-query-layer
Open

feat: add query layer (CM-1059)#3942
ulemons wants to merge 30 commits intomainfrom
feat/add-query-layer

Conversation

@ulemons
Copy link
Contributor

@ulemons ulemons commented Mar 20, 2026

DevStats Affiliations API + Generic DAL resolver

Implements the /v1/dev-stats/affiliations endpoint that allows external tools (e.g. DevStats/gitdm) to resolve GitHub contributor affiliations in bulk.

API

POST /v1/dev-stats/affiliations
Authorization: Bearer <api-key>

{ "githubHandles": ["handle1", "handle2"] }

Accepts up to 1,000 handles per request. Returns resolved, non-overlapping affiliation periods per contributor, sorted by most recent first.

Implementation

services/libs/data-access-layer/src/affiliations/ — new generic module, usable by any consumer:

  • IAffiliationPeriod — public type representing a single affiliation window
  • resolveAffiliationsByMemberIds — bulk resolver for up to N members in 2 DB queries
  • findWorkExperiencesBulk / findManualAffiliationsBulk — exported separately for reuse
  • Conflict resolution mirrors prepareMemberOrganizationAffiliationTimeline but uses an interval-based approach (boundary dates) instead of day-by-day iteration, making it viable for bulk requests

Note

Medium Risk
Adds a new public API endpoint and substantial new affiliation-resolution logic with multiple SQL queries and timeline heuristics; also changes the route mount so the endpoint path shifts from /v1/dev-stats/affiliations to /v1/affiliations, which may be a breaking change for clients.

Overview
Adds a new DevStats affiliations API that accepts up to 1,000 GitHub handles, looks up verified members and emails, and returns per-contributor non-overlapping affiliation periods (POST /v1/affiliations via getAffiliations).

Implements a new generic DAL module (services/libs/data-access-layer/src/affiliations) that bulk-fetches work experiences/manual affiliations and resolves them into ordered affiliation windows using selection priority, gap filling, and date-boundary interval processing, plus adds DevStats-specific DAL queries for GitHub handle and verified email lookup.

Routing change: mounts devStatsRouter() at /v1 instead of /v1/dev-stats, effectively changing the endpoint path and potentially impacting existing integrations.

Written by Cursor Bugbot for commit e5684b7. This will update automatically on new commits. Configure here.

@ulemons ulemons self-assigned this Mar 20, 2026
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

@ulemons ulemons changed the title Feat/add query layer feat: add query layer (CM-1059) Mar 20, 2026
@ulemons ulemons force-pushed the feat/add-query-layer branch 2 times, most recently from 3afbcf5 to 895d9ed Compare March 23, 2026 15:08
ulemons added 24 commits March 24, 2026 15:47
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
ulemons added 2 commits March 24, 2026 15:47
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
@ulemons ulemons force-pushed the feat/add-query-layer branch from 9970c76 to e6f6aba Compare March 24, 2026 14:47
@ulemons ulemons marked this pull request as ready for review March 24, 2026 14:47
Copilot AI review requested due to automatic review settings March 24, 2026 14:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a DevStats-oriented query layer and exposes a new public API endpoint to bulk-resolve GitHub contributor affiliations (including timeline conflict resolution) from existing member work experience and manual affiliation data.

Changes:

  • Exports new DAL modules (affiliations, devStats) from the data-access-layer package.
  • Adds DAL query helpers for DevStats lookups (members by GitHub handle; verified emails by memberId).
  • Implements POST /v1/dev-stats/affiliations handler wired into the public router.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/index.ts Re-exports new DAL entrypoints so downstream services can consume them.
services/libs/data-access-layer/src/devStats/index.ts Adds DevStats-focused queries for bulk member lookup and verified emails.
services/libs/data-access-layer/src/affiliations/index.ts Introduces bulk affiliation resolution and timeline-building logic.
backend/src/api/public/v1/dev-stats/index.ts Wires the /affiliations route to the new handler with scope protection.
backend/src/api/public/v1/dev-stats/getAffiliations.ts Implements request validation, DAL lookups, and response shaping for the endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +256 to +270
log.info(
{
memberId,
boundaryDate: boundaryDate.toISOString(),
orgsAtBoundary: activeOrgsAtBoundary.map((r) => ({
org: r.organizationName,
dateStart: r.dateStart,
dateEnd: r.dateEnd,
isPrimary: r.isPrimaryWorkExperience,
memberCount: r.memberCount,
isManual: r.segmentId !== null,
})),
},
'processing boundary',
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple log.info(...) calls inside the per-boundary loop. For bulk requests (up to 1,000 members) this can generate extremely high log volume and impact latency/cost. Consider downgrading these to debug (or sampling/gating them behind a feature flag), and keeping info only for coarse request-level summaries.

Copilot uses AI. Check for mistakes.
)

if (datedRows.length === 0) {
log.debug({ memberId }, 'no dated rows — returning empty affiliations')
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a member has no dated rows, the resolver returns an empty list even if an undated primary/fallback org exists. Given IAffiliationPeriod allows startDate/endDate = null, consider returning a single undated period (or otherwise representing the fallback) so callers still get an affiliation when only undated work experience data is available.

Suggested change
log.debug({ memberId }, 'no dated rows — returning empty affiliations')
if (fallbackOrg) {
log.debug(
{
memberId,
fallbackOrg: fallbackOrg.organizationName,
},
'no dated rows — returning single undated affiliation from fallback org',
)
return [
{
organization: fallbackOrg.organizationName,
startDate: null,
endDate: null,
},
]
}
log.debug(
{
memberId,
},
'no dated rows and no fallback org — returning empty affiliations',
)

Copilot uses AI. Check for mistakes.

const bodySchema = z.object({
githubHandles: z
.array(z.string().min(1))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bodySchema accepts raw strings; unlike other public endpoints (e.g. lfids: z.string().trim()), this allows handles with leading/trailing whitespace that will never match lower(mi.value). Consider adding .trim() (and possibly .toLowerCase()/normalization) at validation time so lookup behavior is predictable.

Suggested change
.array(z.string().min(1))
.array(z.string().trim().min(1))

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +37
if (memberRows.length === 0) {
ok(res, { total_found: 0, contributors: [], notFound })
return
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response uses total_found (snake_case) while other public v1 endpoints consistently use camelCase response keys (e.g. projectAffiliations, memberId, workExperiences). Consider renaming this to totalFound (and keeping response casing consistent across this API surface).

Copilot uses AI. Check for mistakes.
router.post('/affiliations', requireScopes([SCOPES.READ_AFFILIATIONS]), (_req, res) => {
res.json({ status: 'ok' })
})
router.post('/affiliations', requireScopes([SCOPES.READ_AFFILIATIONS]), getAffiliations)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This route is protected only by read:affiliations, but getAffiliations returns verified email addresses as well. To avoid unintentionally widening access to member identities, consider also requiring SCOPES.READ_MEMBER_IDENTITIES (or remove emails from the response / introduce a dedicated scope for this endpoint).

Suggested change
router.post('/affiliations', requireScopes([SCOPES.READ_AFFILIATIONS]), getAffiliations)
router.post('/affiliations', requireScopes([SCOPES.READ_AFFILIATIONS, SCOPES.READ_MEMBER_IDENTITIES]), getAffiliations)

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +167
/** Returns the org used to fill gaps — primary undated wins, then earliest-created undated. */
function findFallbackOrg(rows: IWorkRow[]): IWorkRow | null {
const primaryUndated = rows.find((r) => r.isPrimaryWorkExperience && !r.dateStart && !r.dateEnd)
if (primaryUndated) return primaryUndated

return (
rows
.filter((r) => !r.dateStart && !r.dateEnd)
.sort((a, b) => new Date(a.createdAt).getTime() - new Date(b.createdAt).getTime())
.at(0) ?? null
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findFallbackOrg currently considers all undated rows, including manual affiliations (segmentId !== null). If an undated manual affiliation exists, it can become the global fallback org and then be used to fill unrelated gaps, losing the segment context. Consider restricting fallback selection to non-manual work experiences (e.g. segmentId === null) to match the existing prepareMemberOrganizationAffiliationTimeline behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +209
function startOfDay(date: Date | string): Date {
const d = new Date(date)
d.setHours(0, 0, 0, 0)
return d
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startOfDay uses setHours(0,0,0,0), which truncates in local time. Since dateStart/dateEnd are timestamptz, this can shift boundaries depending on server timezone/DST and produce off-by-one-day affiliation windows. Consider doing all truncation/arithmetic in UTC (e.g. setUTCHours(0,0,0,0) and getUTCDate/setUTCDate) or switching to a date-only representation before building boundaries.

Copilot uses AI. Check for mistakes.
if (datedRows.length === 0) {
log.debug({ memberId }, 'no dated rows — returning empty affiliations')
return []
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undated-only members silently return empty affiliations

Medium Severity

When a member has only undated work experiences, datedRows is empty and the function returns [] at line 453, even though findFallbackOrg successfully identified an affiliation on line 433. The IAffiliationPeriod type supports null for both startDate and endDate, so the fallback org could be returned as a period. Instead, contributors with known but undated affiliations appear in the API response with zero affiliations — losing information the system already has.

Fix in Cursor Fix in Web

Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
if (datedRows.length === 0) {
log.debug({ memberId }, 'no dated rows — returning empty affiliations')
return []
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undated primary org no longer competes with dated orgs

Medium Severity

The PR description states this "mirrors prepareMemberOrganizationAffiliationTimeline," but the new code produces different results when a primary undated org coexists with dated orgs. The old code passes all remaining orgs (including the undated primary) to findOrgsWithRolesInDate, where the undated org matches (!dateStart && !dateEnd) and is active at every date — winning via selectPrimaryWorkExperience. The new code only passes datedRows (filtered by r.dateStart) to orgsActiveAt, so the undated primary org never competes and only fills gaps as fallbackOrg. This produces materially different affiliation timelines.

Additional Locations (1)
Fix in Cursor Fix in Web

ulemons added 2 commits March 24, 2026 17:14
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

const router = Router()

router.use('/v1/dev-stats', staticApiKeyMiddleware(), devStatsRouter())
router.use('/v1', staticApiKeyMiddleware(), devStatsRouter())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Route mount path breaks all existing OAuth2 endpoints

High Severity

Changing the mount path from '/v1/dev-stats' to '/v1' causes staticApiKeyMiddleware to intercept ALL /v1/* requests, not just dev-stats ones. When an OAuth2-authenticated request arrives (e.g., POST /v1/members), the middleware tries to validate the OAuth2 Bearer token as a static API key, fails, and calls next(new UnauthorizedError(...)). Express then skips the subsequent oauth2Middleware layer entirely and routes the error straight to errorHandler, returning 401 for all existing OAuth2-protected endpoints. Additionally, the endpoint path becomes /v1/affiliations instead of the documented /v1/dev-stats/affiliations.

Fix in Cursor Fix in Web

const primaryUndated = rows.find((r) => r.isPrimaryWorkExperience && !r.dateStart && !r.dateEnd)
const cleaned = primaryUndated
? rows.filter((r) => r.dateStart || r.id === primaryUndated.id)
: rows
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter incorrectly drops undated manual affiliations

Medium Severity

The cleaned filter operates on the combined rows (work experiences + manual affiliations), but the equivalent logic in prepareMemberOrganizationAffiliationTimeline only filters memberOrganizations, leaving manualAffiliations untouched. When a primary undated work experience exists, this filter drops undated manual affiliations too (they lack dateStart and their id differs from the primary). Manual affiliations are supposed to have the highest priority in selectPrimaryWorkExperience, so silently removing them produces incorrect affiliation results.

Fix in Cursor Fix in Web

Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants