A framework to design sound, reproducible and scalable mining repositories studies on GitHub.
- 🧪 Reproducibility-first: declarative configuration and deterministic execution to enable repeatable experiments.
- 📈 Scalable: designed for large-scale repository mining studies on GitHub.
- 🧱 Soundness-focused: encourages transparent, bias-aware, and methodologically explicit study design.
- ⚙️ Modular: independent, reusable modules that can be composed into custom data-processing pipelines.
Prebuilt binaries for macOS, Linux, and Windows are available on the project's GitHub Releases page, along with installer scripts.
Scyros is available through several package managers.
Scyros is published on crates.io and can be installed with Cargo:
cargo install scyrosIf you use Nix with flakes enabled, you can install Scyros directly from GitHub:
nix profile install github:fxpl/scyrosInstall Rust (version 1.94 or newer) by following the instructions on the official website.
Then clone the repository and build:
git clone git@github.com:fxpl/scyros.git
cd scyros
cargo build --releaseThe binary is produced at target/release/scyros. You can optionally move it to a directory in your PATH for easier access.
If you'd like to see how to use Scyros in practice, check out the interactive tutorial!
To discover available commands and modules:
scyros --helpEach module provides its own usage documentation. For example, to inspect the module used to sample random repositories from GitHub:
scyros ids --helpSome modules interact with the GitHub API and require personal access tokens (PATs). Tokens can be created by following GitHub’s documentation: https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token.
Tokens must be provided as a CSV file passed via a command-line argument. The file must contain a single column named token, with one token per line:
token
fa56454....
hj73647.... GitHub enforces API rate limits. Using multiple tokens from the same account does not increase these limits. Users are expected to comply with GitHub’s API terms and rate-limit policies:
Scyros is introduced and described in the following large-scale empirical study. If you use Scyros in academic work, please cite:.
@misc{gilot2026largescalestudyfloatingpointusage,
title={Floating-Point Usage on GitHub: A Large-Scale Study of Statically Typed Languages},
author={Andrea Gilot and Tobias Wrigstad and Eva Darulova},
year={2026},
eprint={2509.04936},
archivePrefix={arXiv},
primaryClass={cs.PL},
url={https://arxiv.org/abs/2509.04936},
} Gilot, A., Wrigstad, T., & Darulova, E. (2026). Floating-Point Usage on GitHub: A Large-Scale Study of Statically Typed Languages. arXiv. https://arxiv.org/abs/2509.04936
This project is licensed under the Apache License 2.0. See LICENSE for details.
See CHANGELOG.md for a detailed list of changes and updates.