linkedinscrape

LinkedIn profile scraper SDK for Python
Extract comprehensive profile data via LinkedIn's Voyager API — as a library or from the command line.

Features

Full profile extraction — identity, headline, about, industry, location, profile picture, background image
Experience & education — all positions and degrees with dates, descriptions, company/school logos
Skills, certifications, projects — complete lists with metadata
Contact info — email, phone, websites, Twitter handles, birthday
Network stats — connections, followers, mutual connections, following state
Additional sections — languages, volunteer, honors, publications, courses, patents, test scores, organizations
Cookie validation — checks your session is alive before scraping, not mid-batch
Stealth by default — auto-detected timezone, randomized User-Agent, jittered delays, display fingerprint rotation
Proxy rotation — single proxy, proxy list, or proxy file with round-robin rotation
Multiple export formats — JSON, CSV, formatted console summary
Batch scraping — file-based username lists with progress logging
Offline parsing — parse previously saved API response JSON files

Installation

# Core SDK
pip install -e .

# With optional CLI (click + rich)
pip install -e ".[cli]"

# Development
pip install -e ".[dev]"

Requires Python 3.10+.

Setup

You need two cookies from an authenticated LinkedIn browser session:

Open LinkedIn in your browser and log in
Open DevTools → Application → Cookies → https://www.linkedin.com
Copy the values of:
- li_at → your session token
- JSESSIONID → your CSRF token (remove the surrounding quotes)

Create a .env file in your project root:

LI_AT=AQEDAUXCrYoED3MK...your_token_here...
CSRF_TOKEN=ajax:5191528851126725620

Tip: The SDK auto-loads .env — no extra setup needed.

Quick Start

from linkedinscrape import LinkedIn

# Credentials are loaded automatically from .env
with LinkedIn() as li:
    profile = li.scrape("username")

    print(profile.full_name)        # "John Doe"
    print(profile.headline)         # "Software Engineer at Google"
    print(profile.current_company)  # "Google"
    print(profile.current_title)    # "Software Engineer"
    print(profile.to_dict())        # Full nested dictionary

The SDK validates your cookies on startup. If they're expired, you get a clear CookieExpiredError immediately — not a cryptic failure 10 requests deep.

Even Simpler

from linkedinscrape import LinkedIn, Exporter

# pass a full proxy file or path to rotate proxies (recommended)
proxies='proxyfile.txt'

# you can pass validate=False to skip cookie validation
li = LinkedIn(proxy_file=proxies, validate=False)
profile = li.scrape("username")

exporter = Exporter('output_dir')
exporter.to_json(profile)

SDK Usage

Single Profile

from linkedinscrape import LinkedIn

li = LinkedIn()
profile = li.scrape("satyanadella")

print(profile.full_name)
print(profile.about)
print(profile.industry_name)
print(profile.connection_info.total_connections)
print(profile.following_state.follower_count)

for pos in profile.positions:
    print(f"{pos.title} at {pos.company_name} ({'Current' if pos.is_current else pos.end_date})")

for edu in profile.educations:
    print(f"{edu.degree_name} - {edu.field_of_study} @ {edu.school_name}")

li.close()

Batch Scraping

from linkedinscrape import LinkedIn

with LinkedIn() as li:
    profiles = li.scrape_batch([
        "satyanadella",
        "williamhgates",
        "jeffweiner08",
    ])

    for p in profiles:
        print(f"{p.full_name} — {p.headline}")

Or from a file:

from pathlib import Path
from linkedinscrape import LinkedIn

with LinkedIn() as li:
    usernames = Path("usernames.txt").read_text().splitlines()
    profiles = li.scrape_batch(usernames)

Explicit Credentials

# Override environment variables
li = LinkedIn(li_at="your_token", csrf_token="ajax:123456")

Parse Saved JSON

from linkedinscrape import LinkedIn

# No credentials needed — works offline
profile = LinkedIn.parse_local("saved_response.json")
print(profile.full_name)

Exporting

from linkedinscrape import LinkedIn, Exporter

with LinkedIn() as li:
    profile = li.scrape("username")

    exporter = Exporter("output")

    # JSON (single profile)
    exporter.to_json(profile)                     # output/username.json

    # JSON (batch)
    exporter.to_json_batch([profile])              # output/profiles.json

    # CSV (flat format for spreadsheets)
    exporter.to_csv([profile])                     # output/profiles.csv

    # Console summary
    exporter.print_summary(profile)

Serialization

Every model has a .to_dict() method:

import json

data = profile.to_dict()
print(json.dumps(data, indent=2))

# Flat dict for CSV/dataframes
flat = profile.to_flat_dict()

Proxy Support

from linkedinscrape import LinkedIn

# Single proxy
li = LinkedIn(proxy="http://user:pass@host:port")

# Multiple proxies (round-robin rotation)
li = LinkedIn(proxies=[
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://proxy3:8080",
])

# From a file (one URL per line, # comments supported)
li = LinkedIn(proxy_file="proxies.txt")

On rate limit (HTTP 429), the proxy automatically rotates to the next one.

Tip: Residential proxies are strongly recommended over datacenter proxies for LinkedIn.

Stealth

The SDK is designed to look like a real browser session:

Technique	Description
Timezone auto-detect	`x-li-track` timezone matches your OS, not a hardcoded value
UA rotation	Random User-Agent picked from a pool of real Chrome versions per session
Display fingerprint	Random screen resolution and DPI from common profiles
Jittered delays	+-30% random variation on inter-request delay (humans aren't metronomes)
Batch jitter	+-40% variation between profiles in batch mode
Cookie validation	Catches expired sessions upfront instead of burning requests
Proxy rotation	Automatic round-robin on rate limit
Decoration fallback	Tries 3 API decoration versions (v93 → v91 → v35)

CLI

# Single profile
linkedinscrape username

# Batch from file
linkedinscrape --file usernames.txt

# Parse local JSON (offline)
linkedinscrape --local response.json

# Export formats
linkedinscrape username --format json          # default
linkedinscrape username --format csv
linkedinscrape username --format both

# Custom output directory
linkedinscrape username --output results/

# Proxy
linkedinscrape username --proxy http://host:port
linkedinscrape username --proxy-file proxies.txt

# Adjust delay (seconds)
linkedinscrape username --delay 3.0

# Skip cookie check on startup
linkedinscrape username --skip-check

# Verbose logging
linkedinscrape username -v

# Skip console summary
linkedinscrape username --no-summary

Data Models

The full profile contains these sections:

Section	Model	Fields
Identity	`LinkedInProfile`	name, headline, about, industry, URLs, flags
Picture	`ProfilePicture`	artifacts with URL, dimensions, expiry
Location	`Location`	country, city, geo name
Contact	`ContactInfo`	email, phone, websites, twitter, birthday
Network	`ConnectionInfo`, `FollowingState`, `MutualConnection`	connections, followers, mutual
Experience	`Position`	title, company, dates, description, type
Education	`Education`	school, degree, field, grade, activities
Skills	`Skill`	name, endorsement count
Certifications	`Certification`	name, authority, license, URL, dates
Projects	`Project`	title, description, URL, members, dates
Languages	`Language`	name, proficiency
Volunteer	`VolunteerExperience`	role, organization, cause, dates
Honors	`HonorAward`	title, issuer, description, date
Publications	`Publication`	name, publisher, URL, date
Courses	`Course`	name, number
Patents	`Patent`	title, issuer, number, status, dates
Test Scores	`TestScore`	name, score, date
Organizations	`Organization`	name, position, dates

Error Handling

LinkedInError
├── AuthenticationError        # Cookies missing at startup
├── CookieExpiredError         # Cookies expired mid-use (401/403)
├── ProfileNotFoundError       # Profile doesn't exist
├── RateLimitError             # HTTP 429
├── RequestError               # Other HTTP errors
└── ParsingError               # Response structure changed

from linkedinscrape import LinkedIn, CookieExpiredError, ProfileNotFoundError

with LinkedIn() as li:
    try:
        profile = li.scrape("username")
    except ProfileNotFoundError:
        print("Profile does not exist")
    except CookieExpiredError:
        print("Session expired — refresh your cookies")

Testing

Since this SDK hits LinkedIn's live API, do not run tests against real accounts casually — it risks triggering rate limits or account restrictions.

Safe testing with saved JSON (no network, no risk)

from linkedinscrape import LinkedIn

# Parse a previously saved API response — no cookies or network needed
profile = LinkedIn.parse_local("output/zaidkx37.json")

print(profile.full_name)
print(profile.headline)
print(len(profile.positions), "positions")
print(len(profile.skills), "skills")
print(profile.to_dict())

You already have saved profiles in output/ from previous runs. Use those:

ls output/
# misterdebugger.json
# muhammad-danyal-31677b33a.json
# salman0x01.json
# zaidkx37.json

Quick smoke test (1 live request)

If you must test live, scrape a single profile with verbose logging:

linkedinscrape username -v --no-summary

Or in Python:

import logging
logging.basicConfig(level=logging.DEBUG)

from linkedinscrape import LinkedIn

with LinkedIn() as li:
    profile = li.scrape("username")
    print(f"OK: {profile.full_name} ({len(profile.positions)} positions)")

Cookie validation only (no profile scrape)

from linkedinscrape import LinkedIn, CookieExpiredError

try:
    li = LinkedIn()  # validates cookies automatically
    print("Cookies are valid")
    li.close()
except CookieExpiredError as e:
    print(f"Expired: {e}")

Export round-trip test

from linkedinscrape import LinkedIn, Exporter

# Parse saved → export → verify file exists
profile = LinkedIn.parse_local("output/zaidkx37.json")

exporter = Exporter("test_output")
path = exporter.to_json(profile)
print(f"Exported to {path}")

csv_path = exporter.to_csv([profile])
print(f"CSV at {csv_path}")

Project Structure

src/linkedinscrape/
├── __init__.py          # Public API exports
├── client.py            # LinkedIn class (main entry point)
├── models.py            # All data models (dataclasses)
├── exceptions.py        # Exception hierarchy
├── exporter.py          # JSON / CSV / console export
├── _http.py             # HTTP client with retry, proxy, rate limiting
├── _endpoints.py        # API URLs, headers, stealth config
├── _parsers.py          # Response parsing logic
└── cli/
    ├── __init__.py
    └── app.py           # Optional CLI (argparse)

Disclaimer & Limitations

This project is for educational and research purposes only.

This SDK uses LinkedIn's undocumented internal Voyager API — the same endpoints the LinkedIn web app uses in your browser. There is no official API support, no guarantee of stability, and no endorsement from LinkedIn.

What you should know

No automated login. You must manually log in to LinkedIn in your browser and copy two cookies (li_at and JSESSIONID). The SDK cannot and will not automate the login process.
Requires a real LinkedIn account. There is no way to use this SDK without an authenticated session from a real account.
Cookies expire. LinkedIn session cookies have a limited lifespan. When they expire, you'll need to copy fresh values from your browser. There is no refresh mechanism.
Your account is at risk. Excessive or aggressive scraping can trigger LinkedIn's anti-abuse systems, leading to temporary restrictions, CAPTCHAs, or permanent account bans. The stealth measures in this SDK reduce — but do not eliminate — this risk.
API can break at any time. LinkedIn can change their internal API structure, query IDs, or decoration versions without notice. This will cause the SDK to fail until updated.
Data accuracy is not guaranteed. The API may return incomplete data, especially for profiles with privacy restrictions or profiles you're not connected to.
Rate limits apply. LinkedIn enforces rate limits. Even with proxy rotation and jittered delays, scraping too many profiles too quickly will get you throttled (HTTP 429) or blocked.
No official support. This is a community project. LinkedIn does not provide documentation or support for the Voyager API.

Use responsibly

Don't scrape profiles in bulk without a legitimate reason
Respect people's privacy and LinkedIn's Terms of Service
Use reasonable delays between requests (the default 1.5s is a minimum)
Consider using proxies for any non-trivial workload
Never share your li_at cookie — it grants full access to your LinkedIn account

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
src/linkedinscrape		src/linkedinscrape
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

linkedinscrape

Features

Installation

Setup

Quick Start

Even Simpler

SDK Usage

Single Profile

Batch Scraping

Explicit Credentials

Parse Saved JSON

Exporting

Serialization

Proxy Support

Stealth

CLI

Data Models

Error Handling

Testing

Safe testing with saved JSON (no network, no risk)

Quick smoke test (1 live request)

Cookie validation only (no profile scrape)

Export round-trip test

Project Structure

Disclaimer & Limitations

What you should know

Use responsibly

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

linkedinscrape

Features

Installation

Setup

Quick Start

Even Simpler

SDK Usage

Single Profile

Batch Scraping

Explicit Credentials

Parse Saved JSON

Exporting

Serialization

Proxy Support

Stealth

CLI

Data Models

Error Handling

Testing

Safe testing with saved JSON (no network, no risk)

Quick smoke test (1 live request)

Cookie validation only (no profile scrape)

Export round-trip test

Project Structure

Disclaimer & Limitations

What you should know

Use responsibly

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages