Reproduction of AAEC essay-level results for LLaMA-3-8B from:
"Unified Argument Mining Pipeline as Text Generation with LLMs"
Target results (macro-F1, AAEC essay level):
| ACC | ARI | ARC | ARIC |
|---|---|---|---|
| 90.7 | 84.9 | 50.3 | 72.3 |
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync --extra dev
# Log in to HuggingFace (required for LLaMA-3 access)
uv run huggingface-cli loginThe AAEC dataset is available at: https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2422
# Option A: provide a direct download URL
AAEC_URL="your-direct-download-url" uv run python scripts/prepare_data.py
# Option B: manual download
# 1. Download the zip from the URL above
# 2. Place it at data/raw/AAEC.zip
uv run python scripts/prepare_data.py# Debug run (local, MPS/CPU, 10 samples, 1 epoch — no model download needed for structure check)
uv run python scripts/train.py --config configs/experiments/llama3-8b-aaec-debug.yaml
# Full training on JeanZay (H100)
sbatch slurm/train_llama3_8b.slurmuv run python scripts/evaluate.py --config configs/experiments/llama3-8b-aaec-essay.yaml
# With explicit model path
uv run python scripts/evaluate.py \
--config configs/experiments/llama3-8b-aaec-essay.yaml \
--model-path outputs/llama3-8b-aaec-essayuv run pytest tests/ -vsrc/argmining/
├── data/
│ ├── download.py # Download AAEC from TU Darmstadt
│ ├── parse_aaec.py # Parse .ann/.txt → structured dicts
│ └── dataset.py # Prompt formatting + HuggingFace Dataset
├── training/
│ └── train.py # QLoRA fine-tuning (PEFT + TRL SFTTrainer)
├── evaluation/
│ ├── predict.py # Inference: generate outputs for test set
│ ├── postprocess.py # Parse JSON outputs, compute metric labels
│ └── metrics.py # ACC / ARI / ARC / ARIC macro-F1
└── utils/
├── config.py # YAML config loader
├── device.py # CUDA → MPS → CPU detection
└── logging.py # Logging setup
scripts/
├── prepare_data.py # Download + parse + split AAEC
├── train.py # Fine-tuning entry point
└── evaluate.py # Evaluation entry point
configs/
├── base.yaml # Shared hyperparameter defaults
└── experiments/
├── llama3-8b-aaec-essay.yaml # Full training config
└── llama3-8b-aaec-debug.yaml # Local debug config
slurm/
├── train_llama3_8b.slurm # JeanZay training job
└── evaluate.slurm # JeanZay evaluation job
| Parameter | Value |
|---|---|
| Model | meta-llama/Meta-Llama-3-8B-Instruct |
| Quantization | 4-bit NF4 (BitsAndBytes), CUDA only |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA target | all linear layers |
| Learning rate | 5e-5 |
| Epochs | 5 |
| Batch size | 2 (+ grad accum 4) |
| Scheduler | cosine |