-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi, thanks for releasing the DAV2 SDT weights.
I am trying to load the following checkpoints from HuggingFace:
dav2_sdt_vitb.pth
dav2_sdt_vitl.pth
However, the structure of the SDTHead and encoder in this repository does not match the architecture expected by these weights.
What I observed
The checkpoints contain layers such as:
depth_head.detail_enhancer.dwconv.weight (shape [128, 1, 3, 3])
depth_head.upsample_1.dysample1.0.init_pos
depth_head.upsample_1.dysample1.1.weight
depth_head.output_conv.0.weight (shape [64, 128, 3, 3])
These modules do not exist in the current SDTHead implementation in this repository.
The current SDTHead uses a different upsampling structure (DySampleUpsamplerWrapper) and different fusion logic, which does not match the checkpoint.
When I manually adapt the code to load the weights, the model runs, but the output depth map is extremely blurry and incorrect (it looks like a very low‑resolution depth map upsampled to full size).
This strongly suggests that the published SDTHead is not the architecture used to train the DAV2 SDT weights.
Additional evidence
According to the official AnyDepth documentation (https://aigeeksgroup.github.io/AnyDepth/), the SDTHead includes:
Detail Enhancer
Multi‑stage DySample upsampling
Output Conv
These modules are present in the HuggingFace DAV2 SDT checkpoints, but are missing from the current SDTHead code in this repository.
This confirms that the HuggingFace weights correspond to an older SDTHead architecture that is not included in the current codebase.
Request
Could you please provide:
the exact SDTHead implementation used to train dav2_sdt_vitb.pth / dav2_sdt_vitl.pth
the corresponding encoder wrapper (feature extraction + intermediate layers)
or a minimal working example that loads the weights and produces correct depth maps?
This would help the community reproduce the DAV2 SDT results correctly.
Thank you very much!