Extract any GitHub repository into a single, organized text file.
extract_repo.sh clones a GitHub repository (public or private), extracts all text-based files, and generates a comprehensive output file containing:
- A hierarchical directory structure
- All code files with clear headers
- Automatic filtering of binary files, build artifacts, and dependencies
Perfect for code reviews, documentation, AI analysis, or creating shareable snapshots of your codebase.
- Branch-specific extraction - Clone and extract from any branch
- Private repository support - Works with GitHub CLI authentication or SSH
- Smart filtering - Automatically excludes binary files, images, and common artifacts
- Hierarchical structure - Tree view of the entire repository at the top
git- For cloning repositoriesgh(optional) - GitHub CLI for private repository access
./extract_repo.sh <repository_url> [branch]The script generates a file named <repo-name>_<branch>_code.txt containing:
===========================================================
Repository: my-project
Branch: main
===========================================================
DIRECTORY STRUCTURE:
-----------------------------------------------------------
.
├── src/
│ ├── components/
│ │ └── Header.jsx
│ └── index.js
├── package.json
└── README.md
===========================================================
===========================================================
FILE: ./README.md
===========================================================
# My Project
...
The script automatically excludes:
Directories:
.git/- Git metadatanode_modules/- Node.js dependenciesvenv/- Python virtual environments__pycache__/- Python cache
Files:
- Binary files (
.so,.dll,.dylib) - Compiled Python (
.pyc) - Images (
.png,.jpg,.jpeg,.gif,.ico,.svg) - Documents (
.pdf) - Archives (
.zip,.tar,.gz) - System files (
.DS_Store)
Edit extract_repo.sh to customize filtering:
Add more excluded directories:
-not -path '*/dist/*' \
-not -path '*/build/*' \Add more file extensions to skip:
-not -name '*.mp4' \
-not -name '*.wav' \Ali Siahkoohi (alisk@ucf.edu)