-
Notifications
You must be signed in to change notification settings - Fork 182
Open
Description
EPIC-025 — Multi-Source Profile Enrichment
Goal
Extend the existing EPIC-024 enrichment system to support profile enrichment from 5 additional sources: Website URL, Facebook Page, Instagram Profile, YouTube Channel, and Linktree — using the pluggable EnrichmentProviderInterface pattern already in place.
Background
EPIC-024 delivered the core enrichment framework with Apollo (company/person), Hunter/Clearbit (logo), and Demo providers. This epic adds source-based providers that extract profile data from public URLs rather than API lookups by email/domain.
New Source Providers
| # | Source | Provider Class | API / Method | Fields Extracted |
|---|---|---|---|---|
| 1 | Website URL | WebsiteProvider |
Crawler (meta tags, OG, JSON-LD, schema.org) | name, description, emails, phones, address, social links, logo, favicon |
| 2 | Facebook Page | FacebookProvider |
Facebook Graph API (preferred) / OG fallback | page name, bio, website, phone, address, profile pic, cover photo, category |
| 3 | Instagram Profile | InstagramProvider |
Instagram Basic Display API / OG fallback | username, full name, bio, profile pic, website link, follower count |
| 4 | YouTube Channel | YouTubeProvider |
YouTube Data API v3 | channel name, description, custom URL, subscriber count, avatar, banner |
| 5 | Linktree | LinktreeProvider |
Crawler (structured HTML/JSON) | display name, bio, avatar, all link entries (social, website, payment) |
Architecture
Extends existing EnrichmentProviderInterface with a new method:
public function enrichFromUrl(string $url): EnrichmentResult;
public function canHandleUrl(string $url): bool;The EnrichmentService gains a enrichByUrl(string $url) method that auto-detects the URL type and dispatches to the correct provider.
Stories
| # | Story | GitHub Issue | MAGIK | Priority |
|---|---|---|---|---|
| 1 | Provider Interface Extension — enrichFromUrl() + URL detection |
#TBD | MAGIK-934 | P0 |
| 2 | Website URL Provider — Meta/OG/JSON-LD Extraction | #TBD | MAGIK-935 | P0 |
| 3 | Facebook Page Provider — Graph API + OG Fallback | #TBD | MAGIK-936 | P1 |
| 4 | Instagram Profile Provider — Basic Display API + OG Fallback | #TBD | MAGIK-937 | P1 |
| 5 | YouTube Channel Provider — Data API v3 | #TBD | MAGIK-938 | P1 |
| 6 | Linktree Provider — Structured HTML/JSON Crawler | #TBD | MAGIK-939 | P1 |
| 7 | Field Normalization & Deduplication Service | #TBD | MAGIK-940 | P0 |
| 8 | DB Schema — Enrichment Sources + Evidence Table | #TBD | MAGIK-941 | P0 |
| 9 | Review UI — Current vs Suggested, Accept/Reject per Field | #TBD | MAGIK-942 | P0 |
| 10 | Media Integration — S3-Ready Logo/Cover Image Storage | #TBD | MAGIK-943 | P1 |
| 11 | Rate Limiting & Consent Controls | #TBD | MAGIK-944 | P1 |
| 12 | Multi-Tenant Scoping — Parent/Child Feature Toggles | #TBD | MAGIK-945 | P1 |
| 13 | Demo Provider Extension — Fake URL-Based Data | #TBD | MAGIK-946 | P2 |
| 14 | Integration Testing — Multi-Source End-to-End | #TBD | MAGIK-947 | P2 |
Technical Approach
- Extend
EnrichmentProviderInterface— addenrichFromUrl()+canHandleUrl()methods; update existing providers with no-op stubs - Strategy pattern — URL → provider routing via
canHandleUrl()chain inEnrichmentService - Confidence + Evidence — each extracted field carries confidence score (0.0–1.0) and source evidence (snippet + URL)
- Normalization — E.164 phones via libphonenumber, address component parsing, unique social link dedup
- Draft-only storage — results stored in
enrichment_drafts(extended schema); never auto-applied - Review UI — side-by-side current vs suggested, per-field accept/reject, batch apply with audit log
- Rate limiting — per-provider, per-tenant rate limits via
enrichment_rate_limitstable - Consent — checkbox required before enrichment; consent stored in audit log
Scope
- CI4 Portal (
app.portalv2) - All user roles (Admin, Reseller, Org, Employee, Individual) with appropriate access gates
- Extends existing EPIC-024 infrastructure (same tables, same controller, same service)
Acceptance Criteria (Epic-Level)
- User can paste a Website URL and see extracted profile fields as draft
- User can paste a Facebook Page URL and see extracted profile fields
- User can paste an Instagram profile URL and see extracted profile fields
- User can paste a YouTube channel URL and see extracted profile fields
- User can paste a Linktree URL and see all extracted links + profile data
- All extracted fields show confidence scores and source evidence
- Review UI shows current vs suggested side-by-side for each field
- User can accept/reject individual fields before applying
- Applied changes are audit-logged with source attribution
- Approved images (logo/cover) auto-save to Media tab (S3-ready)
- Rate limiting prevents API abuse per provider per tenant
- Consent checkbox required before any enrichment action
- Multi-tenant scoping: parent can enable/disable for children
- Demo mode works for all 5 new sources without API keys
- All providers are swappable (API vs crawler) without app logic changes
Security & Compliance
- No automatic profile overwrite — all results are DRAFT
- Consent required before enrichment (GDPR/CCPA alignment)
- Rate limiting prevents abuse and cost overrun
- Crawler providers document robots.txt compliance
- API keys stored in
.env, never in code - PII fields encrypted at rest in draft table
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels