Skip to content

luqigroup/git-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

repo-to-txt

Extract any GitHub repository into a single, organized text file.

Overview

extract_repo.sh clones a GitHub repository (public or private), extracts all text-based files, and generates a comprehensive output file containing:

  • A hierarchical directory structure
  • All code files with clear headers
  • Automatic filtering of binary files, build artifacts, and dependencies

Perfect for code reviews, documentation, AI analysis, or creating shareable snapshots of your codebase.

Features

  • Branch-specific extraction - Clone and extract from any branch
  • Private repository support - Works with GitHub CLI authentication or SSH
  • Smart filtering - Automatically excludes binary files, images, and common artifacts
  • Hierarchical structure - Tree view of the entire repository at the top

Prerequisites

  • git - For cloning repositories
  • gh (optional) - GitHub CLI for private repository access

Usage

./extract_repo.sh <repository_url> [branch]

Output

The script generates a file named <repo-name>_<branch>_code.txt containing:

===========================================================
Repository: my-project
Branch: main
===========================================================

DIRECTORY STRUCTURE:
-----------------------------------------------------------
.
├── src/
│   ├── components/
│   │   └── Header.jsx
│   └── index.js
├── package.json
└── README.md

===========================================================

===========================================================
FILE: ./README.md
===========================================================
# My Project
...

What Gets Filtered Out

The script automatically excludes:

Directories:

  • .git/ - Git metadata
  • node_modules/ - Node.js dependencies
  • venv/ - Python virtual environments
  • __pycache__/ - Python cache

Files:

  • Binary files (.so, .dll, .dylib)
  • Compiled Python (.pyc)
  • Images (.png, .jpg, .jpeg, .gif, .ico, .svg)
  • Documents (.pdf)
  • Archives (.zip, .tar, .gz)
  • System files (.DS_Store)

Customization

Edit extract_repo.sh to customize filtering:

Add more excluded directories:

-not -path '*/dist/*' \
-not -path '*/build/*' \

Add more file extensions to skip:

-not -name '*.mp4' \
-not -name '*.wav' \

Authors

Ali Siahkoohi (alisk@ucf.edu)

About

Extract any GitHub repository into a single, organized text file

Resources

License

Stars

Watchers

Forks

Contributors

Languages