ig-archiver

Scrapes shared reels and posts from Instagram DMs, takes screenshots with Playwright, and generates AI summaries via GPT-4o vision.

ig-archiver/
├── extension/                    # chrome extension (React + TypeScript + Tailwind v4 + Vite 6)
│   ├── src/
│   │   ├── components/           # UI components (app, header, scan-button, progress-bar, status-feed)
│   │   ├── platform/             # platform abstraction (types, chromePlatform, electronPlatform)
│   │   ├── lib/                  # archiveStream, scraper, config, truncate
│   │   ├── main.tsx
│   │   ├── types.ts
│   │   └── index.css
│   ├── public/
│   │   ├── manifest.json
│   │   └── content-script.js
│   ├── package.json
│   └── dist/                     # built output — load this folder in Chrome
└── server/                       # express backend
    ├── server.js
    ├── login.js
    ├── lib/                      # db, config, capture, summarize
    ├── package.json
    ├── .env.example
    ├── screenshots/              # autogenerated — PNG captures
    └── database.json             # autogenerated — archive records

Quick Start

1 — Set up the server

cd server
yarn install

First time only: install the Playwright browser binary:
yarn playwright install chromium

Copy the env template and add your OpenAI key:

cp .env.example .env

OPENAI_API_KEY=sk-...your-key-here...
PORT=3000

2 — Log in to Instagram

The server requires an authenticated Instagram session to visit and screenshot each post. Run this once:

yarn run login

A browser window will open. Log in to Instagram, then come back to the terminal and press Enter. Your session is saved to session.json and loaded automatically on every subsequent server start.

If Instagram ever logs you out, just run yarn run login again.

3 — Start the server

yarn start
# or
yarn run dev

You should see:

[ig-archiver] server running on http://localhost:3000
[ig-archiver] Screenshots → .../server/screenshots
[ig-archiver] Database    → .../server/database.json

4 — Build and load the extension

cd extension
yarn install
yarn run build

Then in Chrome:

Navigate to chrome://extensions/
Enable Developer mode (top-right toggle)
Click Load unpacked
Select the extension/dist/ folder

The extension does not require icons to work. Broken icon warnings can be ignored, or add your own 16×16, 48×48, and 128×128 PNGs to extension/icons/.

5 — Archive shared content from Instagram

Open Instagram in Chrome — the extension's content script begins intercepting data immediately
Navigate to the DM conversation you want to archive
Click the IG Archiver toolbar icon
Click Scan & Archive Chat
The extension auto-scrolls to the top of the conversation (5 batches by default), then archives every shared reel and post it found
Watch the real-time progress as each reel/post is visited, screenshotted, and summarised

To capture more history, increase SCROLL_LOADS in extension/src/lib/config.ts and rebuild.

How it works

Chrome Extension                        Node.js Server
────────────────────────────────        ────────────────────────────────
content-script.js (MAIN world)          POST /archive  { urls: [...] }
  → patches XHR at document_start            │
  → captures get_slide_thread_nullable        ↓
    and fetch__SlideThread graphql      for each URL (streaming NDJSON):
    responses as you browse                1. Playwright visits URL (authenticated)
  → stores SlideMessageXMAContent              - waitUntil: load (falls back to
    nodes in window.__igSlideThreads            domcontentloaded on timeout)
                                               - 1280×720 viewport
autoScrollOnce() (MAIN world)                  - SHA-1 filename → screenshots/
  → reads pageInfo cursor from           2. extract <title>, meta description,
    window.__igSlideThreads                  and post caption (article h1)
  → replays XHR to fetch older batch    3. GPT-4o vision → screenshot + caption
  → called SCROLL_LOADS times before       → summary, category + keywords
    scraping                            4. upsert entry in database.json
                                        5. stream progress event back
scrapeExternalLinks() (MAIN world)
  → reads window.__igSlideThreads
  → matches current thread via
    thread_key → thread_fbid mapping
  → extracts target_url from each
    XMA node (instagram.com/p/ or
    instagram.com/reel/ only)
  → sends URL list to server

database.json schema

{
  "url": "https://www.instagram.com/reel/ABC123/",
  "title": "Example post title",
  "metaDescription": "A brief description from the page.",
  "summary": "A one-sentence AI-generated overview of the post.",
  "category": "Learning",
  "keywords": "cooking, recipe, italian",
  "screenshotPath": "screenshots/3a9f12b04c1e.png",
  "archivedAt": "2026-03-03T10:00:00.000Z",
  "createdAt": "2026-03-03T10:00:00.000Z"
}

Categories: References · Memes · Inspiration · Tutorials · News · Ai · Tools · Music production · Movies and shows · Design · Music · Politics (one or two per entry)

Keywords: up to three comma-separated terms per entry, generated by the model

Configuration

`.env` variable	Default	Description
`OPENAI_API_KEY`	—	Required. Your OpenAI key.
`PORT`	`3000`	Server listen port.
`SCREENSHOT_WIDTH`	`1280`	Viewport / screenshot width.
`SCREENSHOT_HEIGHT`	`720`	Viewport / screenshot height.
`MOCK`	—	Set to `true` to skip OpenAI calls and return placeholder data.

SCROLL_LOADS (extension-side, in extension/src/lib/config.ts) controls how many scroll batches are fetched before scraping. Default is 5.

TIMEOUT_MS (server-side, in server/lib/config.js) sets the per-URL Playwright navigation timeout. Default is 30000 ms.

Troubleshooting

Symptom	Fix
`No session.json found`	Run `yarn run login` in `/server`.
`OPENAI_API_KEY is not set`	Copy `.env.example` → `.env` and add your key.
`Navigation failed` for a URL	The site may block headless browsers or be down. The URL is skipped; other URLs continue.
Extension shows "No shared posts found"	Make sure the page was loaded with the extension active (reload the tab after installing).
`Error: connect ECONNREFUSED` in extension	Make sure the server is running (`yarn start` in `/server`).
Playwright browser not found	Run `yarn playwright install chromium` inside `/server`.
Fewer links than expected	Increase `SCROLL_LOADS` in `extension/src/lib/config.ts` and rebuild — Instagram loads messages in batches.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
extension		extension
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ig-archiver

Quick Start

1 — Set up the server

2 — Log in to Instagram

3 — Start the server

4 — Build and load the extension

5 — Archive shared content from Instagram

How it works

database.json schema

Configuration

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ig-archiver

Quick Start

1 — Set up the server

2 — Log in to Instagram

3 — Start the server

4 — Build and load the extension

5 — Archive shared content from Instagram

How it works

database.json schema

Configuration

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages