Welcome! This repository is the central hub for the TechLauncher JoeyLLM Team. We use this space to organize the project, manage all documentation, and track our progress.
JoeyLLM is a hands-on project focused on building language-model workflows from end to end, with a strong emphasis on Australian and other domain-specific language use.
The team’s work moves through a practical, multi-step pipeline:
- Clean & Filter: Processing large web datasets (like the 60TB FineWeb corpus).
- Classify: Building classifiers that identify useful properties in the text, such as region or domain.
- Fine-Tune: Using those curated, high-quality datasets to fine-tune specialized language models.
The goal is not just to produce models, but to deeply understand how data quality, infrastructure, and training choices shape the behavior of modern LLM systems.
Keep all project work inside GitHub. GitHub already provides everything we need for documentation, planning, and task tracking.
⛔ Important: You may receive suggestions from tutors, lecturers, or others to use different third-party tools (like Notion, Jira, or Trello) for planning or documentation. Those are only suggestions. For this project, keep everything inside this repository and the GitHub platform so our work stays organized, consolidated, and easy to review.
If you are a new team member, please read these documents first to get oriented:
Project/ProjectGoal.mdComputeInfrastructure/Wireguard.mdTeam/Introductions.mdWeekly/README.md
This repository is strictly for documentation and project management (no code!). Here is how our knowledge base is organized:
| Folder | Purpose |
|---|---|
ComputeInfrastructure/ |
Access setup, VPN instructions, and GPU/system documentation. |
Data/ |
Data sources, dataset rules, and cleaning/preprocessing documentation. |
LearningResources/ |
Shared notes, references, and useful links for the team. |
Models/ |
Documentation regarding our model architectures and training plans. |
Project/ |
Project goals, planning docs, and semester overviews. |
Team/ |
Team member info, roles, and contact details. |
Weekly/ |
Meeting notes, weekly reports, and reporting templates. |
Repos/ |
Rules and links for our code repositories, platforms, and related setups. |
📦 What This Repository Is For
- Project documentation and architecture rules
- Planning and task tracking
- Combined weekly meeting notes and progress reports
- Team organization
📋 The GitHub Project Board We use a GitHub Project Kanban board called JoeyLLM Team to manage tasks and track progress. All tasks, issues, and progress updates should be recorded on this board so the work remains visible and organized for everyone.
All documentation here should be written so it is easy to read directly in GitHub's web interface. When adding or editing files, please follow these rules:
✍️ Format
- Use standard Markdown (
.md) files. - Write documents so they are clear and highly scannable.
- Use headers, bullet points, and lists to break up walls of text.
- Keep files structured and logically organized within their folders.
✂️ Keep It Short
- Try not to let documentation pages grow too massive.
- Keep documents short, punchy, and concise.
- If a document becomes too long, split it into a new, separate file.
- Each document should ideally focus on one main topic.
This makes the documentation easier to read, maintain, and navigate for the whole team! 🙂