Scrapes shared reels and posts from Instagram DMs, takes screenshots with Playwright, and generates AI summaries via GPT-4o vision.
ig-archiver/
├── extension/ # chrome extension (React + TypeScript + Tailwind v4 + Vite 6)
│ ├── src/
│ │ ├── components/ # UI components (app, header, scan-button, progress-bar, status-feed)
│ │ ├── platform/ # platform abstraction (types, chromePlatform, electronPlatform)
│ │ ├── lib/ # archiveStream, scraper, config, truncate
│ │ ├── main.tsx
│ │ ├── types.ts
│ │ └── index.css
│ ├── public/
│ │ ├── manifest.json
│ │ └── content-script.js
│ ├── package.json
│ └── dist/ # built output — load this folder in Chrome
└── server/ # express backend
├── server.js
├── login.js
├── lib/ # db, config, capture, summarize
├── package.json
├── .env.example
├── screenshots/ # autogenerated — PNG captures
└── database.json # autogenerated — archive records
cd server
yarn installFirst time only: install the Playwright browser binary:
yarn playwright install chromium
Copy the env template and add your OpenAI key:
cp .env.example .envOPENAI_API_KEY=sk-...your-key-here...
PORT=3000The server requires an authenticated Instagram session to visit and screenshot each post. Run this once:
yarn run loginA browser window will open. Log in to Instagram, then come back to the terminal and press Enter. Your session is saved to session.json and loaded automatically on every subsequent server start.
If Instagram ever logs you out, just run
yarn run loginagain.
yarn start
# or
yarn run devYou should see:
[ig-archiver] server running on http://localhost:3000
[ig-archiver] Screenshots → .../server/screenshots
[ig-archiver] Database → .../server/database.json
cd extension
yarn install
yarn run buildThen in Chrome:
- Navigate to
chrome://extensions/ - Enable Developer mode (top-right toggle)
- Click Load unpacked
- Select the
extension/dist/folder
The extension does not require icons to work. Broken icon warnings can be ignored, or add your own 16×16, 48×48, and 128×128 PNGs to
extension/icons/.
- Open Instagram in Chrome — the extension's content script begins intercepting data immediately
- Navigate to the DM conversation you want to archive
- Click the IG Archiver toolbar icon
- Click Scan & Archive Chat
- The extension auto-scrolls to the top of the conversation (5 batches by default), then archives every shared reel and post it found
- Watch the real-time progress as each reel/post is visited, screenshotted, and summarised
To capture more history, increase
SCROLL_LOADSinextension/src/lib/config.tsand rebuild.
Chrome Extension Node.js Server
──────────────────────────────── ────────────────────────────────
content-script.js (MAIN world) POST /archive { urls: [...] }
→ patches XHR at document_start │
→ captures get_slide_thread_nullable ↓
and fetch__SlideThread graphql for each URL (streaming NDJSON):
responses as you browse 1. Playwright visits URL (authenticated)
→ stores SlideMessageXMAContent - waitUntil: load (falls back to
nodes in window.__igSlideThreads domcontentloaded on timeout)
- 1280×720 viewport
autoScrollOnce() (MAIN world) - SHA-1 filename → screenshots/
→ reads pageInfo cursor from 2. extract <title>, meta description,
window.__igSlideThreads and post caption (article h1)
→ replays XHR to fetch older batch 3. GPT-4o vision → screenshot + caption
→ called SCROLL_LOADS times before → summary, category + keywords
scraping 4. upsert entry in database.json
5. stream progress event back
scrapeExternalLinks() (MAIN world)
→ reads window.__igSlideThreads
→ matches current thread via
thread_key → thread_fbid mapping
→ extracts target_url from each
XMA node (instagram.com/p/ or
instagram.com/reel/ only)
→ sends URL list to server
{
"url": "https://www.instagram.com/reel/ABC123/",
"title": "Example post title",
"metaDescription": "A brief description from the page.",
"summary": "A one-sentence AI-generated overview of the post.",
"category": "Learning",
"keywords": "cooking, recipe, italian",
"screenshotPath": "screenshots/3a9f12b04c1e.png",
"archivedAt": "2026-03-03T10:00:00.000Z",
"createdAt": "2026-03-03T10:00:00.000Z"
}Categories: References · Memes · Inspiration · Tutorials · News · Ai · Tools · Music production · Movies and shows · Design · Music · Politics (one or two per entry)
Keywords: up to three comma-separated terms per entry, generated by the model
.env variable |
Default | Description |
|---|---|---|
OPENAI_API_KEY |
— | Required. Your OpenAI key. |
PORT |
3000 |
Server listen port. |
SCREENSHOT_WIDTH |
1280 |
Viewport / screenshot width. |
SCREENSHOT_HEIGHT |
720 |
Viewport / screenshot height. |
MOCK |
— | Set to true to skip OpenAI calls and return placeholder data. |
SCROLL_LOADS (extension-side, in extension/src/lib/config.ts) controls how many scroll batches are fetched before scraping. Default is 5.
TIMEOUT_MS (server-side, in server/lib/config.js) sets the per-URL Playwright navigation timeout. Default is 30000 ms.
| Symptom | Fix |
|---|---|
No session.json found |
Run yarn run login in /server. |
OPENAI_API_KEY is not set |
Copy .env.example → .env and add your key. |
Navigation failed for a URL |
The site may block headless browsers or be down. The URL is skipped; other URLs continue. |
| Extension shows "No shared posts found" | Make sure the page was loaded with the extension active (reload the tab after installing). |
Error: connect ECONNREFUSED in extension |
Make sure the server is running (yarn start in /server). |
| Playwright browser not found | Run yarn playwright install chromium inside /server. |
| Fewer links than expected | Increase SCROLL_LOADS in extension/src/lib/config.ts and rebuild — Instagram loads messages in batches. |