Detect whether a GitHub repository's code was likely written by an LLM.
Likelihoodlum analyzes a repository's commit history and uses timing-based heuristics to estimate the likelihood that the code was generated by a large language model rather than written by a human.
The core idea is simple: humans type slow, LLMs don't. If someone is pushing hundreds of lines of polished code every few minutes β or shipping an entire app in a week β something's up.
Real results from real repos. Every score below was generated by Likelihoodlum with default settings (--max-commits 200).
| Repository | β Stars | Score | Verdict | Daily Output | Velocity | Authors |
|---|---|---|---|---|---|---|
| anthropics/claudes-c-compiler | 2,399 | 81/100 | π€ Very likely LLM-generated | 60,638 lines in 1 day | 7.6 l/min median | claude (198 commits) |
Anthropic literally named the author
claude. 198 of 200 commits. 60K lines in a single day. The tool didn't even break a sweat on this one.
| Repository | β Stars | Score | Verdict | Key Signals |
|---|---|---|---|---|
| openai/codex | β | 35/100 | π€ Possibly LLM-assisted | 9,562 lines/active day, 17% LLM message patterns |
| jlowin/fastmcp | β | 34/100 | π€ Possibly LLM-assisted | 3,046 lines/active day, extreme session productivity |
| rust-lang/rust | β | 32/100 | π€ Possibly LLM-assisted | 7,032 lines/active day (merge commits inflate this) |
| Significant-Gravitas/AutoGPT | β | 29/100 | π€ Likely human-written | 74% LLM message patterns, but 14 authors + low velocity |
| microsoft/vscode | β | 28/100 | π€ Likely human-written | Large merge commits inflate daily output |
| twitter/the-algorithm | β | 26/100 | π€ Likely human-written | 13,379 lines/active day β bulk repo dump |
| Repository | β Stars | Score | Verdict | Median Velocity | Authors | LLM Messages |
|---|---|---|---|---|---|---|
| vuejs/core | β | 20/100 | π€ Likely human-written | 0.0 l/min | 52 | 66%* |
| pallets/flask | β | 17/100 | π€ Likely human-written | β | 23 | 0% |
| meshtastic/meshtastic-apple | β | 16/100 | π€ Likely human-written | 0.3 l/min | 8 | 4% |
| denoland/deno | β | 15/100 | π€ Likely human-written | 0.1 l/min | 41 | 56%* |
| langchain-ai/langchain | β | 15/100 | π€ Likely human-written | 0.1 l/min | 44 | 57%* |
| vercel/next.js | β | 14/100 | π€ Almost certainly human-written | 0.2 l/min | 29 | β |
| facebook/react | β | 10/100 | π€ Almost certainly human-written | 0.1 l/min | 29 | β |
| pydantic/pydantic | β | 10/100 | π€ Almost certainly human-written | 0.1 l/min | 47 | 18% |
| stackblitz/bolt.new | β | 2/100 | π€ Almost certainly human-written | 0.1 l/min | 17 | 4% |
| godotengine/godot | β | 2/100 | π€ Almost certainly human-written | 0.1 l/min | 51 | β |
| golang/go | β | 0/100 | π€ Almost certainly human-written | 0.0 l/min | 85 | 0% |
| django/django | β | 0/100 | π€ Almost certainly human-written | 0.0 l/min | 73 | 0% |
| bitcoin/bitcoin | β | 0/100 | π€ Almost certainly human-written | 0.1 l/min | 23 | 0% |
| sveltejs/svelte | β | 0/100 | π€ Almost certainly human-written | 0.1 l/min | 36 | 1% |
| tinygrad/tinygrad | β | 0/100 | π€ Almost certainly human-written | 0.1 l/min | 15 | β |
| nixos/nixpkgs | β | 0/100 | π€ Almost certainly human-written | β | 41 | 0% |
| torvalds/linux | β | 41/100β | π€ Possibly LLM-assisted | 1.6 l/min | 35 | 0% |
* High message pattern % from conventional commit style (
feat():,fix():), not actual LLM usage.β Linux kernel scores higher than expected because Torvalds merges massive subsystem PRs β each merge looks like thousands of lines appearing instantly. The multi-author discount (β10) keeps it in check.
Because who better to test an LLM detector on than the company making the LLMs?
| Repository | Score | Verdict | Notes |
|---|---|---|---|
| anthropics/claude-code | 0/100 | π€ Almost certainly human-written | 20 authors, human velocity, CV=7.62. Real team, real software. |
| anthropics/anthropic-sdk-python | 0/100 | π€ Almost certainly human-written | 159/200 commits were bots (filtered out). The 41 human commits? Glacial pace. |
| anthropics/claude-code-action | 0/100 | π€ Almost certainly human-written | 35 authors, every negative signal fired. |
| anthropics/skills | 3/100 | π€ Almost certainly human-written | 10K lines/active day triggered +20, but β25 in negatives crushed it. |
| modelcontextprotocol/servers | 0/100 | π€ Almost certainly human-written | 39 authors. Community-driven. |
| anthropics/claudes-c-compiler | 81/100 | π€ Very likely LLM-generated | The one that proves the tool works. Author is literally claude. |
The irony: the company building the most capable coding LLM in the world writes their own code by hand. Except when they let Claude write a C compiler for fun β and the tool caught it instantly.
Likelihoodlum fetches commit history and repository metadata via the GitHub API, then scores the repo on a 0β100 scale across twelve heuristic signals. Generated and vendored files (lockfiles, protobufs, Xcode project files, build artifacts, etc.) are automatically filtered out so they don't inflate velocity measurements. Bot accounts (e.g. dependabot[bot]) are excluded from author counts and velocity calculations.
Commit details are fetched concurrently (up to 10 parallel requests) for significantly faster analysis, and bot commits are skipped entirely to save API calls.
| # | Signal | Points | What It Measures |
|---|---|---|---|
| 1 | Code Velocity | β10 to +35 | Lines changed per minute between consecutive commits by the same author. When the trimmed mean is significantly higher than the median (heavy tail of fast intervals), the score is boosted further. |
| 2 | Session Productivity | β5 to +20 | Groups commits into coding sessions (>2hr gap = new session) and measures aggregate lines/min. Human-pace sessions actively reduce the score. |
| 3 | Commit Size Uniformity | β5 to +15 | LLM dumps tend to be uniformly large. Human commits vary in size (small fixes, big features, etc). High variation is rewarded with negative points. |
| 4 | Commit Message Patterns | 0 to +15 | Catches generic messages like "Implement X", "Add Y functionality", "Fix issue with Z", conventional commits with verbose scopes or multi-scopes (feat(a, b):), and other LLM-typical phrasings. If messages look clean but velocity is very high, a small cross-signal bonus is applied. |
| 5 | Burst Detection | 0 to +15 | Flags sessions where >300 authored lines appeared in under 30 minutes (rapid bursts), plus longer sessions with sustained extreme throughput (β₯10 lines/min). |
| 6 | Multi-Author Discount | β10 to +5 | Real projects tend to have multiple contributors (score penalty). Solo-author repos get a small bump. Bot accounts are excluded from the count. |
| 7 | Extreme Per-Commit Velocity | 0 to +10 | Counts commit intervals exceeding 50 lines/min (~3,000 lines/hr). Even a small percentage of these is a strong signal. |
| 8 | Commit Time-of-Day | 0 to +5 | Flags repos where >30% of commits happen between midnightβ6am and velocity is suspicious. Humans have circadian rhythms; LLMs don't sleep. |
| 9 | Comment Density | β3 to +5 | LLMs over-explain β they add verbose comments, docstrings, and inline explanations at a much higher rate than most humans. Very low comment density (typical human laziness) earns a negative signal. |
| 10 | Diff Entropy | β3 to +5 | Measures Shannon entropy of diff content. LLM-generated diffs tend to be more repetitive/formulaic (lower entropy). Human diffs are messier and more varied (higher entropy). |
| 11 | Project-Scale Plausibility | β5 to +20 | The big-picture sanity check. Compares total authored output against the repo's true creation date (fetched from GitHub metadata) and active coding days. A senior engineer produces ~200β500 lines of production code per day β 10,000+ lines/day sustained over weeks is implausible without LLM assistance. |
| 12 | Generated File Ratio | Informational | Reports what percentage of line changes are in generated/vendor files (excluded from all calculations above). |
Note: The score uses both positive signals (suspicious patterns push the score up) and negative signals (clearly human patterns actively pull it down). The final score is clamped to 0β100. Patch content from commit diffs is analyzed for comment density and entropy calculations.
| Threshold | Lines/Min | Lines/Hr | Interpretation |
|---|---|---|---|
| Clearly Human | < 0.5 | < 30 | Normal productive human |
| Human Upper | < 1.5 | < 90 | Fast human, maybe some copy-paste |
| Suspicious | β₯ 4.0 | β₯ 240 | Quite fast, could be assisted |
| Very Suspicious | β₯ 10.0 | β₯ 600 | Almost certainly not hand-typed |
| Lines/Active Day | Interpretation | Points |
|---|---|---|
| < 300 | Normal human pace | β5 (if span β₯ 14 days) |
| 300β799 | Fast but plausible | 0 |
| 800β1,999 | Above average | +5 |
| 2,000β4,999 | Very high, likely assisted | +12 |
| β₯ 5,000 | Implausible for a human | +20 |
| Comment Ratio | Interpretation | Points |
|---|---|---|
| < 5% | Human laziness | β3 |
| 5β24% | Normal range | 0 |
| 25β34% | Above average | +3 |
| β₯ 35% | LLM over-commenting | +5 |
| Entropy (bits/char) | Interpretation | Points |
|---|---|---|
| > 5.5 | Varied, chaotic (human) | β3 |
| 4.3β5.5 | Normal range | 0 |
| 4.0β4.3 | Below average | +3 |
| < 4.0 | Repetitive/formulaic (LLM) | +5 |
| Score | Verdict |
|---|---|
| 75β100 | π€ Very likely LLM-generated |
| 50β74 | π€ Likely LLM-assisted |
| 30β49 | π€ Possibly LLM-assisted |
| 15β29 | π€ Likely human-written |
| 0β14 | π€ Almost certainly human-written |
The following file types are automatically excluded from velocity, size, and daily output calculations:
- Lock files:
package-lock.json,yarn.lock,Podfile.lock,Cargo.lock,go.sum, etc. - Xcode / Apple:
.pbxproj,.xcworkspacedata,.xcscheme - Protobuf / codegen:
.pb.go,.pb.swift,_pb2.py,.g.dart,.freezed.dart,.generated.* - Build artifacts:
.min.js,.min.css,.map,dist/,build/,vendor/,node_modules/ - Data / assets:
.json,.svg,.png,.jpg,.ico, fonts
Authors matching known bot patterns (e.g. dependabot[bot], renovate-bot) are automatically:
- Excluded from velocity and session calculations
- Excluded from author counts (so a solo dev + dependabot is correctly identified as a solo author)
- Skipped during commit detail fetching (saves API calls)
Zero dependencies β runs on Python 3.10+ with only the standard library.
pip install git+https://github.com/gotnull/likelihoodlum.gitThen run it from anywhere:
likelihoodlum owner/repogit clone https://github.com/gotnull/likelihoodlum.git
cd likelihoodlum
python3 llm_detector.py owner/repoInstall python-dotenv for .env file support (a built-in fallback parser is included if you don't):
pip install python-dotenvWithout a token you're limited to 60 API requests/hour. With one, you get 5,000/hr.
Option A: .env file (recommended)
cp .env.example .env
# Edit .env and add your tokenGITHUB_TOKEN=ghp_your_token_here
The .env file is gitignored by default β your token stays safe.
Option B: Environment variable
export GITHUB_TOKEN="ghp_your_token_here"Option C: CLI flag
python3 llm_detector.py owner/repo --token ghp_your_token_here- Go to GitHub β Settings β Developer settings β Personal access tokens
- Generate a new token (classic) with
public_reposcope (orrepofor private repos) - Copy it β you won't see it again
# Basic β analyze a public repo
python3 llm_detector.py owner/repo
# Full GitHub URL works too
python3 llm_detector.py https://github.com/owner/repo
# Analyze more commits (default is 200)
python3 llm_detector.py owner/repo --max-commits 1000
# Target a specific branch
python3 llm_detector.py owner/repo --branch develop
# Machine-readable JSON output
python3 llm_detector.py owner/repo --json
# Go big
python3 llm_detector.py owner/repo --max-commits 5000 --json > report.json| Flag | Default | Description |
|---|---|---|
repo (positional) |
β | GitHub repo as owner/repo or full URL |
--token |
$GITHUB_TOKEN |
GitHub personal access token |
--branch |
repo default | Branch to analyze |
--max-commits |
200 |
Maximum number of commits to fetch |
--json |
off | Output results as JSON |
============================================================
LLM Code Detector Report
Repository: anthropics/claudes-c-compiler
============================================================
π Commits analyzed: 200
π
Time span: 0 days
π₯ Authors: 2
β’ claude: 198 commits
β’ carlini: 2 commits
π Line changes breakdown:
Total: 60,638
Authored: 60,638 (used for analysis)
Generated: 0 (filtered out)
π Project-scale output:
Repo created: 2026-02-04
Active days: 1 (of 1 calendar days)
Lines/active day: 60,638
Lines/calendar day: 60,638
β οΈ (>5,000 β implausible for a human)
β‘ Velocity (authored lines/min between commits):
Median: 7.61 (β 457 lines/hr)
Trimmed mean: 12.14 (β 728 lines/hr)
Max: 630.41
Intervals above suspicious threshold: 69/101
π₯ Fastest commit intervals:
12d83516βdc196034 767 lines in 1.2 min = 630.41 l/min β οΈ
f2ac8159β8fe4994a 163 lines in 1.0 min = 160.33 l/min β οΈ
734d5fabβe836df40 1029 lines in 7.2 min = 143.25 l/min β οΈ
...
π Coding sessions (gap > 120 min): 2
π¬ Commit messages matching LLM patterns: 4/200 (2.0%)
β’ "Update x86 assembler README: fix stale line counts"
β’ "Add missing ARM64 system registers to assembler"
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π― LLM Likelihood Score: [ββββββββββββββββββββββββΒ·Β·Β·Β·Β·Β·] 81/100
π€ Very likely LLM-generated
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Reasoning:
β’ Median velocity is suspiciously high (7.6 lines/min β 457 lines/hr)
β’ Trimmed mean (12.1 l/min) is 1.6Γ the median β heavy tail of fast intervals
β’ 44% of intervals show very high velocity
β’ Median session productivity is extreme (78.5 lines/min)
β’ Commit sizes vary widely β typical of human work (CV=8.97) [-5]
β’ Commit messages look clean but velocity is high β possible curated LLM workflow
β’ 12 commit intervals (12%) show extreme velocity (>50 lines/min) [+4]
β’ Project-scale output is implausible: 60,638 authored lines over 1 days
(1 active) = 60,638 lines/active day [+20]
β Disclaimer: This is a heuristic analysis and NOT definitive proof.
Fast coding can also indicate copy-paste, boilerplate generators,
IDE scaffolding, or simply an experienced developer.
When using --json, the output includes:
{
"repository": "owner/repo",
"commits_analyzed": 200,
"score": 81,
"verdict": "π€ Very likely LLM-generated",
"reasons": ["..."],
"velocity_stats": {
"median_lpm": 7.61,
"trimmed_mean_lpm": 12.14,
"intervals": 101
},
"line_changes": {
"total": 60638,
"authored": 60638,
"generated": 0
},
"project_scale": {
"repo_created_at": "2026-02-04T00:00:00+00:00",
"calendar_days": 1,
"active_days": 1,
"lines_per_active_day": 60638
},
"message_analysis": {
"total": 200,
"pattern_hits": 4,
"ratio": 0.02,
"sample_flagged": ["..."]
},
"sessions": 2,
"authors": 2
}The tool fetches repo metadata (1 call), commit listings (1βN calls depending on page count), and detailed stats per non-bot commit. Bot commits are skipped to save calls. Detail requests are made concurrently (up to 10 in parallel) for faster analysis.
| Auth | Rate Limit | Max Commits Comfortable |
|---|---|---|
| No token | 60/hr | ~50 |
| With token | 5,000/hr | ~4,000 |
This tool uses heuristics, not magic. A high score doesn't prove LLM usage, and a low score doesn't disprove it.
False positives can come from:
- Copy-pasting code from other projects
- IDE/framework scaffolding and boilerplate generators
- Squashed/rebased commits that compress work
- Merge commits (maintainers merging large PRs)
- An experienced developer who plans before coding
- Generated code (protobufs, OpenAPI, etc.)
False negatives can come from:
- LLM-generated code committed slowly or in small chunks
- Human-edited LLM output committed as normal work
- Commits with manual timing that mimics human patterns
- Repos where
--max-commitsdoesn't capture the full picture
Use responsibly. This is a curiosity tool, not a courtroom exhibit.
MIT β do whatever you want with it.
Found a new heuristic? PRs welcome. Ideas:
- File-type breakdown (LLMs love generating configs)
- Code style consistency metrics
- Cross-file similarity detection (LLMs repeat patterns)
- Cross-referencing with known LLM output patterns
- Language-specific signal tuning
- Timezone inference from commit patterns
Built with vibes and a healthy suspicion of anyone committing 10,000 lines a day. π²