Skip to content

Flavio-Mangione/AML-Challenge-Model-Stitching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model-Stitching Challenge

Sapienza University of Rome

Advanced Machine Learning Course

Python Sapienza


Overview

This repository contains our solution for the kaggle competition for Advanced Machine Learning course for the master degree in Data Science.


Team Members

STUDENT ID
Luca De Ruggiero 2174783
Elena Di Grigoli 2011814
Fabrizio Ferrara 2207087
Flavio Mangione 2201201

Task

Our challenge was to solve an image–text retrieval task, where the goal is to generate caption embeddings that maximize the Mean Reciprocal Rank (MRR) when matched against ground-truth image embeddings, while also keeping the model as lightweight and efficient as possible.

Dataset Structure

For this challenge we work with two specific datasets:

  • test_clean.npz
  • train.npz.

The train file consists of 125k captions associated with 25k unique images, while the test_clean file contains 1500 captions used for the inference task and for scoring in the Kaggle competition.


Evaluation

Model performance is measured using Mean Reciprocal Rank (MRR). For each test caption, its predicted embedding is compared against all gallery image embeddings, in batches of 100, and the rank of the correct image is used to compute the reciprocal rank.


Repository Structure

├── Data/                           # Dataset folder
│   ├── test_clean.npz      
│   └── train.npz
├── Utils and Functions/               
│   ├── metrics.py   # metrics definitions                     
│   └── eval_2.py    # evaluation function for the validation
├── Challenge_Notebook.ipynb         # Notebook for the submission
├── README.md

Model

The model used is a translator that maps 1024-dimensional text embeddings into the 1536-dimensional DINOv2 image-embedding space. It uses a two-block encoder with LayerNorm, GELU, and dropout for regularization, followed by a decoder that reconstructs the target embedding dimension. A learnable temperature parameter (logit_scale) is included for contrastive alignment. The full architecture consists of approximately the number of trainable parameters reported below

About

Repository for the Kaggle Competition for the Advanced Machine Learning course

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors