Skip to content

🎲 Detect whether a GitHub repo's code was likely written by an LLM. Zero dependencies. Scores repos 0-100 using commit velocity, session analysis, burst detection, message patterns, and project-scale plausibility.

License

Notifications You must be signed in to change notification settings

gotnull/likelihoodlum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🎲 Likelihoodlum

GitHub stars CI License: MIT Python 3.10+ Zero Dependencies

Detect whether a GitHub repository's code was likely written by an LLM.

Likelihoodlum analyzes a repository's commit history and uses timing-based heuristics to estimate the likelihood that the code was generated by a large language model rather than written by a human.

The core idea is simple: humans type slow, LLMs don't. If someone is pushing hundreds of lines of polished code every few minutes β€” or shipping an entire app in a week β€” something's up.


πŸ† The Wall of Truth

Real results from real repos. Every score below was generated by Likelihoodlum with default settings (--max-commits 200).

πŸ€– Caught Red-Handed

Repository ⭐ Stars Score Verdict Daily Output Velocity Authors
anthropics/claudes-c-compiler 2,399 81/100 πŸ€– Very likely LLM-generated 60,638 lines in 1 day 7.6 l/min median claude (198 commits)

Anthropic literally named the author claude. 198 of 200 commits. 60K lines in a single day. The tool didn't even break a sweat on this one.

πŸ€” Suspicious β€” You Decide

Repository ⭐ Stars Score Verdict Key Signals
openai/codex β€” 35/100 πŸ€” Possibly LLM-assisted 9,562 lines/active day, 17% LLM message patterns
jlowin/fastmcp β€” 34/100 πŸ€” Possibly LLM-assisted 3,046 lines/active day, extreme session productivity
rust-lang/rust β€” 32/100 πŸ€” Possibly LLM-assisted 7,032 lines/active day (merge commits inflate this)
Significant-Gravitas/AutoGPT β€” 29/100 πŸ‘€ Likely human-written 74% LLM message patterns, but 14 authors + low velocity
microsoft/vscode β€” 28/100 πŸ‘€ Likely human-written Large merge commits inflate daily output
twitter/the-algorithm β€” 26/100 πŸ‘€ Likely human-written 13,379 lines/active day β€” bulk repo dump

πŸ‘€ Certified Human

Repository ⭐ Stars Score Verdict Median Velocity Authors LLM Messages
vuejs/core β€” 20/100 πŸ‘€ Likely human-written 0.0 l/min 52 66%*
pallets/flask β€” 17/100 πŸ‘€ Likely human-written β€” 23 0%
meshtastic/meshtastic-apple β€” 16/100 πŸ‘€ Likely human-written 0.3 l/min 8 4%
denoland/deno β€” 15/100 πŸ‘€ Likely human-written 0.1 l/min 41 56%*
langchain-ai/langchain β€” 15/100 πŸ‘€ Likely human-written 0.1 l/min 44 57%*
vercel/next.js β€” 14/100 πŸ‘€ Almost certainly human-written 0.2 l/min 29 β€”
facebook/react β€” 10/100 πŸ‘€ Almost certainly human-written 0.1 l/min 29 β€”
pydantic/pydantic β€” 10/100 πŸ‘€ Almost certainly human-written 0.1 l/min 47 18%
stackblitz/bolt.new β€” 2/100 πŸ‘€ Almost certainly human-written 0.1 l/min 17 4%
godotengine/godot β€” 2/100 πŸ‘€ Almost certainly human-written 0.1 l/min 51 β€”
golang/go β€” 0/100 πŸ‘€ Almost certainly human-written 0.0 l/min 85 0%
django/django β€” 0/100 πŸ‘€ Almost certainly human-written 0.0 l/min 73 0%
bitcoin/bitcoin β€” 0/100 πŸ‘€ Almost certainly human-written 0.1 l/min 23 0%
sveltejs/svelte β€” 0/100 πŸ‘€ Almost certainly human-written 0.1 l/min 36 1%
tinygrad/tinygrad β€” 0/100 πŸ‘€ Almost certainly human-written 0.1 l/min 15 β€”
nixos/nixpkgs β€” 0/100 πŸ‘€ Almost certainly human-written β€” 41 0%
torvalds/linux β€” 41/100† πŸ€” Possibly LLM-assisted 1.6 l/min 35 0%

* High message pattern % from conventional commit style (feat():, fix():), not actual LLM usage.

† Linux kernel scores higher than expected because Torvalds merges massive subsystem PRs β€” each merge looks like thousands of lines appearing instantly. The multi-author discount (βˆ’10) keeps it in check.

πŸ”¬ The Anthropic Spotlight

Because who better to test an LLM detector on than the company making the LLMs?

Repository Score Verdict Notes
anthropics/claude-code 0/100 πŸ‘€ Almost certainly human-written 20 authors, human velocity, CV=7.62. Real team, real software.
anthropics/anthropic-sdk-python 0/100 πŸ‘€ Almost certainly human-written 159/200 commits were bots (filtered out). The 41 human commits? Glacial pace.
anthropics/claude-code-action 0/100 πŸ‘€ Almost certainly human-written 35 authors, every negative signal fired.
anthropics/skills 3/100 πŸ‘€ Almost certainly human-written 10K lines/active day triggered +20, but βˆ’25 in negatives crushed it.
modelcontextprotocol/servers 0/100 πŸ‘€ Almost certainly human-written 39 authors. Community-driven.
anthropics/claudes-c-compiler 81/100 πŸ€– Very likely LLM-generated The one that proves the tool works. Author is literally claude.

The irony: the company building the most capable coding LLM in the world writes their own code by hand. Except when they let Claude write a C compiler for fun β€” and the tool caught it instantly.


How It Works

Likelihoodlum fetches commit history and repository metadata via the GitHub API, then scores the repo on a 0–100 scale across twelve heuristic signals. Generated and vendored files (lockfiles, protobufs, Xcode project files, build artifacts, etc.) are automatically filtered out so they don't inflate velocity measurements. Bot accounts (e.g. dependabot[bot]) are excluded from author counts and velocity calculations.

Commit details are fetched concurrently (up to 10 parallel requests) for significantly faster analysis, and bot commits are skipped entirely to save API calls.

Scoring Signals

# Signal Points What It Measures
1 Code Velocity βˆ’10 to +35 Lines changed per minute between consecutive commits by the same author. When the trimmed mean is significantly higher than the median (heavy tail of fast intervals), the score is boosted further.
2 Session Productivity βˆ’5 to +20 Groups commits into coding sessions (>2hr gap = new session) and measures aggregate lines/min. Human-pace sessions actively reduce the score.
3 Commit Size Uniformity βˆ’5 to +15 LLM dumps tend to be uniformly large. Human commits vary in size (small fixes, big features, etc). High variation is rewarded with negative points.
4 Commit Message Patterns 0 to +15 Catches generic messages like "Implement X", "Add Y functionality", "Fix issue with Z", conventional commits with verbose scopes or multi-scopes (feat(a, b):), and other LLM-typical phrasings. If messages look clean but velocity is very high, a small cross-signal bonus is applied.
5 Burst Detection 0 to +15 Flags sessions where >300 authored lines appeared in under 30 minutes (rapid bursts), plus longer sessions with sustained extreme throughput (β‰₯10 lines/min).
6 Multi-Author Discount βˆ’10 to +5 Real projects tend to have multiple contributors (score penalty). Solo-author repos get a small bump. Bot accounts are excluded from the count.
7 Extreme Per-Commit Velocity 0 to +10 Counts commit intervals exceeding 50 lines/min (~3,000 lines/hr). Even a small percentage of these is a strong signal.
8 Commit Time-of-Day 0 to +5 Flags repos where >30% of commits happen between midnight–6am and velocity is suspicious. Humans have circadian rhythms; LLMs don't sleep.
9 Comment Density βˆ’3 to +5 LLMs over-explain β€” they add verbose comments, docstrings, and inline explanations at a much higher rate than most humans. Very low comment density (typical human laziness) earns a negative signal.
10 Diff Entropy βˆ’3 to +5 Measures Shannon entropy of diff content. LLM-generated diffs tend to be more repetitive/formulaic (lower entropy). Human diffs are messier and more varied (higher entropy).
11 Project-Scale Plausibility βˆ’5 to +20 The big-picture sanity check. Compares total authored output against the repo's true creation date (fetched from GitHub metadata) and active coding days. A senior engineer produces ~200–500 lines of production code per day β€” 10,000+ lines/day sustained over weeks is implausible without LLM assistance.
12 Generated File Ratio Informational Reports what percentage of line changes are in generated/vendor files (excluded from all calculations above).

Note: The score uses both positive signals (suspicious patterns push the score up) and negative signals (clearly human patterns actively pull it down). The final score is clamped to 0–100. Patch content from commit diffs is analyzed for comment density and entropy calculations.

Velocity Thresholds

Threshold Lines/Min Lines/Hr Interpretation
Clearly Human < 0.5 < 30 Normal productive human
Human Upper < 1.5 < 90 Fast human, maybe some copy-paste
Suspicious β‰₯ 4.0 β‰₯ 240 Quite fast, could be assisted
Very Suspicious β‰₯ 10.0 β‰₯ 600 Almost certainly not hand-typed

Daily Output Thresholds

Lines/Active Day Interpretation Points
< 300 Normal human pace βˆ’5 (if span β‰₯ 14 days)
300–799 Fast but plausible 0
800–1,999 Above average +5
2,000–4,999 Very high, likely assisted +12
β‰₯ 5,000 Implausible for a human +20

Comment Density Thresholds

Comment Ratio Interpretation Points
< 5% Human laziness βˆ’3
5–24% Normal range 0
25–34% Above average +3
β‰₯ 35% LLM over-commenting +5

Diff Entropy Thresholds

Entropy (bits/char) Interpretation Points
> 5.5 Varied, chaotic (human) βˆ’3
4.3–5.5 Normal range 0
4.0–4.3 Below average +3
< 4.0 Repetitive/formulaic (LLM) +5

Verdicts

Score Verdict
75–100 πŸ€– Very likely LLM-generated
50–74 πŸ€– Likely LLM-assisted
30–49 πŸ€” Possibly LLM-assisted
15–29 πŸ‘€ Likely human-written
0–14 πŸ‘€ Almost certainly human-written

Generated File Filtering

The following file types are automatically excluded from velocity, size, and daily output calculations:

  • Lock files: package-lock.json, yarn.lock, Podfile.lock, Cargo.lock, go.sum, etc.
  • Xcode / Apple: .pbxproj, .xcworkspacedata, .xcscheme
  • Protobuf / codegen: .pb.go, .pb.swift, _pb2.py, .g.dart, .freezed.dart, .generated.*
  • Build artifacts: .min.js, .min.css, .map, dist/, build/, vendor/, node_modules/
  • Data / assets: .json, .svg, .png, .jpg, .ico, fonts

Bot Author Filtering

Authors matching known bot patterns (e.g. dependabot[bot], renovate-bot) are automatically:

  • Excluded from velocity and session calculations
  • Excluded from author counts (so a solo dev + dependabot is correctly identified as a solo author)
  • Skipped during commit detail fetching (saves API calls)

Installation

Zero dependencies β€” runs on Python 3.10+ with only the standard library.

Option A: pip install (recommended)

pip install git+https://github.com/gotnull/likelihoodlum.git

Then run it from anywhere:

likelihoodlum owner/repo

Option B: Clone and run

git clone https://github.com/gotnull/likelihoodlum.git
cd likelihoodlum
python3 llm_detector.py owner/repo

Optional: .env file support

Install python-dotenv for .env file support (a built-in fallback parser is included if you don't):

pip install python-dotenv

Setup

GitHub Token (Recommended)

Without a token you're limited to 60 API requests/hour. With one, you get 5,000/hr.

Option A: .env file (recommended)

cp .env.example .env
# Edit .env and add your token
GITHUB_TOKEN=ghp_your_token_here

The .env file is gitignored by default β€” your token stays safe.

Option B: Environment variable

export GITHUB_TOKEN="ghp_your_token_here"

Option C: CLI flag

python3 llm_detector.py owner/repo --token ghp_your_token_here

Generating a Token

  1. Go to GitHub β†’ Settings β†’ Developer settings β†’ Personal access tokens
  2. Generate a new token (classic) with public_repo scope (or repo for private repos)
  3. Copy it β€” you won't see it again

Usage

# Basic β€” analyze a public repo
python3 llm_detector.py owner/repo

# Full GitHub URL works too
python3 llm_detector.py https://github.com/owner/repo

# Analyze more commits (default is 200)
python3 llm_detector.py owner/repo --max-commits 1000

# Target a specific branch
python3 llm_detector.py owner/repo --branch develop

# Machine-readable JSON output
python3 llm_detector.py owner/repo --json

# Go big
python3 llm_detector.py owner/repo --max-commits 5000 --json > report.json

CLI Reference

Flag Default Description
repo (positional) β€” GitHub repo as owner/repo or full URL
--token $GITHUB_TOKEN GitHub personal access token
--branch repo default Branch to analyze
--max-commits 200 Maximum number of commits to fetch
--json off Output results as JSON

Example Output

============================================================
  LLM Code Detector Report
  Repository: anthropics/claudes-c-compiler
============================================================

πŸ“Š Commits analyzed: 200
πŸ“… Time span: 0 days
πŸ‘₯ Authors: 2
   β€’ claude: 198 commits
   β€’ carlini: 2 commits

πŸ“ Line changes breakdown:
   Total:     60,638
   Authored:  60,638 (used for analysis)
   Generated: 0 (filtered out)

πŸ“ˆ Project-scale output:
   Repo created:         2026-02-04
   Active days:          1 (of 1 calendar days)
   Lines/active day:     60,638
   Lines/calendar day:   60,638
    ⚠️  (>5,000 β€” implausible for a human)

⚑ Velocity (authored lines/min between commits):
   Median:        7.61  (β‰ˆ 457 lines/hr)
   Trimmed mean:  12.14  (β‰ˆ 728 lines/hr)
   Max:           630.41
   Intervals above suspicious threshold: 69/101

πŸ”₯ Fastest commit intervals:
   12d83516β†’dc196034  767 lines in 1.2 min = 630.41 l/min ⚠️
   f2ac8159β†’8fe4994a  163 lines in 1.0 min = 160.33 l/min ⚠️
   734d5fabβ†’e836df40  1029 lines in 7.2 min = 143.25 l/min ⚠️
   ...

πŸ• Coding sessions (gap > 120 min): 2

πŸ’¬ Commit messages matching LLM patterns: 4/200 (2.0%)
   β€’ "Update x86 assembler README: fix stale line counts"
   β€’ "Add missing ARM64 system registers to assembler"

────────────────────────────────────────────────────────────
  🎯 LLM Likelihood Score: [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆΒ·Β·Β·Β·Β·Β·] 81/100
  πŸ€– Very likely LLM-generated
────────────────────────────────────────────────────────────

πŸ“ Reasoning:
   β€’ Median velocity is suspiciously high (7.6 lines/min β‰ˆ 457 lines/hr)
   β€’ Trimmed mean (12.1 l/min) is 1.6Γ— the median β€” heavy tail of fast intervals
   β€’ 44% of intervals show very high velocity
   β€’ Median session productivity is extreme (78.5 lines/min)
   β€’ Commit sizes vary widely β€” typical of human work (CV=8.97) [-5]
   β€’ Commit messages look clean but velocity is high β€” possible curated LLM workflow
   β€’ 12 commit intervals (12%) show extreme velocity (>50 lines/min) [+4]
   β€’ Project-scale output is implausible: 60,638 authored lines over 1 days
     (1 active) = 60,638 lines/active day [+20]

⚠  Disclaimer: This is a heuristic analysis and NOT definitive proof.
   Fast coding can also indicate copy-paste, boilerplate generators,
   IDE scaffolding, or simply an experienced developer.

JSON Output

When using --json, the output includes:

{
  "repository": "owner/repo",
  "commits_analyzed": 200,
  "score": 81,
  "verdict": "πŸ€– Very likely LLM-generated",
  "reasons": ["..."],
  "velocity_stats": {
    "median_lpm": 7.61,
    "trimmed_mean_lpm": 12.14,
    "intervals": 101
  },
  "line_changes": {
    "total": 60638,
    "authored": 60638,
    "generated": 0
  },
  "project_scale": {
    "repo_created_at": "2026-02-04T00:00:00+00:00",
    "calendar_days": 1,
    "active_days": 1,
    "lines_per_active_day": 60638
  },
  "message_analysis": {
    "total": 200,
    "pattern_hits": 4,
    "ratio": 0.02,
    "sample_flagged": ["..."]
  },
  "sessions": 2,
  "authors": 2
}

API Usage

The tool fetches repo metadata (1 call), commit listings (1–N calls depending on page count), and detailed stats per non-bot commit. Bot commits are skipped to save calls. Detail requests are made concurrently (up to 10 in parallel) for faster analysis.

Auth Rate Limit Max Commits Comfortable
No token 60/hr ~50
With token 5,000/hr ~4,000

Limitations & Disclaimer

This tool uses heuristics, not magic. A high score doesn't prove LLM usage, and a low score doesn't disprove it.

False positives can come from:

  • Copy-pasting code from other projects
  • IDE/framework scaffolding and boilerplate generators
  • Squashed/rebased commits that compress work
  • Merge commits (maintainers merging large PRs)
  • An experienced developer who plans before coding
  • Generated code (protobufs, OpenAPI, etc.)

False negatives can come from:

  • LLM-generated code committed slowly or in small chunks
  • Human-edited LLM output committed as normal work
  • Commits with manual timing that mimics human patterns
  • Repos where --max-commits doesn't capture the full picture

Use responsibly. This is a curiosity tool, not a courtroom exhibit.

License

MIT β€” do whatever you want with it.

Contributing

Found a new heuristic? PRs welcome. Ideas:

  • File-type breakdown (LLMs love generating configs)
  • Code style consistency metrics
  • Cross-file similarity detection (LLMs repeat patterns)
  • Cross-referencing with known LLM output patterns
  • Language-specific signal tuning
  • Timezone inference from commit patterns

Built with vibes and a healthy suspicion of anyone committing 10,000 lines a day. 🎲

About

🎲 Detect whether a GitHub repo's code was likely written by an LLM. Zero dependencies. Scores repos 0-100 using commit velocity, session analysis, burst detection, message patterns, and project-scale plausibility.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors

Languages