Skip to content

joeyllm/JoeyLLM-Team

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦘 JoeyLLM – TechLauncher Team

Welcome! This repository is the central hub for the TechLauncher JoeyLLM Team. We use this space to organize the project, manage all documentation, and track our progress.


🎯 Project Vision

JoeyLLM is a hands-on project focused on building language-model workflows from end to end, with a strong emphasis on Australian and other domain-specific language use.

The team’s work moves through a practical, multi-step pipeline:

  • Clean & Filter: Processing large web datasets (like the 60TB FineWeb corpus).
  • Classify: Building classifiers that identify useful properties in the text, such as region or domain.
  • Fine-Tune: Using those curated, high-quality datasets to fine-tune specialized language models.

The goal is not just to produce models, but to deeply understand how data quality, infrastructure, and training choices shape the behavior of modern LLM systems.


🛑 The "Stay in GitHub" Rule

Keep all project work inside GitHub. GitHub already provides everything we need for documentation, planning, and task tracking.

⛔ Important: You may receive suggestions from tutors, lecturers, or others to use different third-party tools (like Notion, Jira, or Trello) for planning or documentation. Those are only suggestions. For this project, keep everything inside this repository and the GitHub platform so our work stays organized, consolidated, and easy to review.


🚀 Where to Start

If you are a new team member, please read these documents first to get oriented:

  1. Project/ProjectGoal.md
  2. ComputeInfrastructure/Wireguard.md
  3. Team/Introductions.md
  4. Weekly/README.md

🗂️ Repository Structure

This repository is strictly for documentation and project management (no code!). Here is how our knowledge base is organized:

Folder Purpose
ComputeInfrastructure/ Access setup, VPN instructions, and GPU/system documentation.
Data/ Data sources, dataset rules, and cleaning/preprocessing documentation.
LearningResources/ Shared notes, references, and useful links for the team.
Models/ Documentation regarding our model architectures and training plans.
Project/ Project goals, planning docs, and semester overviews.
Team/ Team member info, roles, and contact details.
Weekly/ Meeting notes, weekly reports, and reporting templates.
Repos/ Rules and links for our code repositories, platforms, and related setups.

📊 Project Tracking & Usage

📦 What This Repository Is For

  • Project documentation and architecture rules
  • Planning and task tracking
  • Combined weekly meeting notes and progress reports
  • Team organization

📋 The GitHub Project Board We use a GitHub Project Kanban board called JoeyLLM Team to manage tasks and track progress. All tasks, issues, and progress updates should be recorded on this board so the work remains visible and organized for everyone.


📚 Documentation Guidelines

All documentation here should be written so it is easy to read directly in GitHub's web interface. When adding or editing files, please follow these rules:

✍️ Format

  • Use standard Markdown (.md) files.
  • Write documents so they are clear and highly scannable.
  • Use headers, bullet points, and lists to break up walls of text.
  • Keep files structured and logically organized within their folders.

✂️ Keep It Short

  • Try not to let documentation pages grow too massive.
  • Keep documents short, punchy, and concise.
  • If a document becomes too long, split it into a new, separate file.
  • Each document should ideally focus on one main topic.

This makes the documentation easier to read, maintain, and navigate for the whole team! 🙂

About

TechLauncher Team Page 2026

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors