Skip to content

EPIC-025: Multi-Source Profile Enrichment (Website, Facebook, Instagram, YouTube, Linktree) #113

@MAGIKBIT

Description

@MAGIKBIT

EPIC-025 — Multi-Source Profile Enrichment

Goal

Extend the existing EPIC-024 enrichment system to support profile enrichment from 5 additional sources: Website URL, Facebook Page, Instagram Profile, YouTube Channel, and Linktree — using the pluggable EnrichmentProviderInterface pattern already in place.

Background

EPIC-024 delivered the core enrichment framework with Apollo (company/person), Hunter/Clearbit (logo), and Demo providers. This epic adds source-based providers that extract profile data from public URLs rather than API lookups by email/domain.

New Source Providers

# Source Provider Class API / Method Fields Extracted
1 Website URL WebsiteProvider Crawler (meta tags, OG, JSON-LD, schema.org) name, description, emails, phones, address, social links, logo, favicon
2 Facebook Page FacebookProvider Facebook Graph API (preferred) / OG fallback page name, bio, website, phone, address, profile pic, cover photo, category
3 Instagram Profile InstagramProvider Instagram Basic Display API / OG fallback username, full name, bio, profile pic, website link, follower count
4 YouTube Channel YouTubeProvider YouTube Data API v3 channel name, description, custom URL, subscriber count, avatar, banner
5 Linktree LinktreeProvider Crawler (structured HTML/JSON) display name, bio, avatar, all link entries (social, website, payment)

Architecture

Extends existing EnrichmentProviderInterface with a new method:

public function enrichFromUrl(string $url): EnrichmentResult;
public function canHandleUrl(string $url): bool;

The EnrichmentService gains a enrichByUrl(string $url) method that auto-detects the URL type and dispatches to the correct provider.

Stories

# Story GitHub Issue MAGIK Priority
1 Provider Interface Extension — enrichFromUrl() + URL detection #TBD MAGIK-934 P0
2 Website URL Provider — Meta/OG/JSON-LD Extraction #TBD MAGIK-935 P0
3 Facebook Page Provider — Graph API + OG Fallback #TBD MAGIK-936 P1
4 Instagram Profile Provider — Basic Display API + OG Fallback #TBD MAGIK-937 P1
5 YouTube Channel Provider — Data API v3 #TBD MAGIK-938 P1
6 Linktree Provider — Structured HTML/JSON Crawler #TBD MAGIK-939 P1
7 Field Normalization & Deduplication Service #TBD MAGIK-940 P0
8 DB Schema — Enrichment Sources + Evidence Table #TBD MAGIK-941 P0
9 Review UI — Current vs Suggested, Accept/Reject per Field #TBD MAGIK-942 P0
10 Media Integration — S3-Ready Logo/Cover Image Storage #TBD MAGIK-943 P1
11 Rate Limiting & Consent Controls #TBD MAGIK-944 P1
12 Multi-Tenant Scoping — Parent/Child Feature Toggles #TBD MAGIK-945 P1
13 Demo Provider Extension — Fake URL-Based Data #TBD MAGIK-946 P2
14 Integration Testing — Multi-Source End-to-End #TBD MAGIK-947 P2

Technical Approach

  1. Extend EnrichmentProviderInterface — add enrichFromUrl() + canHandleUrl() methods; update existing providers with no-op stubs
  2. Strategy pattern — URL → provider routing via canHandleUrl() chain in EnrichmentService
  3. Confidence + Evidence — each extracted field carries confidence score (0.0–1.0) and source evidence (snippet + URL)
  4. Normalization — E.164 phones via libphonenumber, address component parsing, unique social link dedup
  5. Draft-only storage — results stored in enrichment_drafts (extended schema); never auto-applied
  6. Review UI — side-by-side current vs suggested, per-field accept/reject, batch apply with audit log
  7. Rate limiting — per-provider, per-tenant rate limits via enrichment_rate_limits table
  8. Consent — checkbox required before enrichment; consent stored in audit log

Scope

  • CI4 Portal (app.portalv2)
  • All user roles (Admin, Reseller, Org, Employee, Individual) with appropriate access gates
  • Extends existing EPIC-024 infrastructure (same tables, same controller, same service)

Acceptance Criteria (Epic-Level)

  • User can paste a Website URL and see extracted profile fields as draft
  • User can paste a Facebook Page URL and see extracted profile fields
  • User can paste an Instagram profile URL and see extracted profile fields
  • User can paste a YouTube channel URL and see extracted profile fields
  • User can paste a Linktree URL and see all extracted links + profile data
  • All extracted fields show confidence scores and source evidence
  • Review UI shows current vs suggested side-by-side for each field
  • User can accept/reject individual fields before applying
  • Applied changes are audit-logged with source attribution
  • Approved images (logo/cover) auto-save to Media tab (S3-ready)
  • Rate limiting prevents API abuse per provider per tenant
  • Consent checkbox required before any enrichment action
  • Multi-tenant scoping: parent can enable/disable for children
  • Demo mode works for all 5 new sources without API keys
  • All providers are swappable (API vs crawler) without app logic changes

Security & Compliance

  • No automatic profile overwrite — all results are DRAFT
  • Consent required before enrichment (GDPR/CCPA alignment)
  • Rate limiting prevents abuse and cost overrun
  • Crawler providers document robots.txt compliance
  • API keys stored in .env, never in code
  • PII fields encrypted at rest in draft table

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions