Skip to content

Possible typo: Paper states 60 MP3D scenes, but annotation has 61 unique scenes #82

@followingcode

Description

@followingcode

Hi authors,
In section 3.4 of your paper, it mentions:
"Vision-Language Action Data. We collect navigation-specific training data using the Habitat simulator across multiple public VLN datasets. Specifically, we collect 450K samples (video clips) from 60 Matterport3D [25] (MP3D) environments, sourced from R2R [7], R2R-EnvDrop [26] and RxR [8]."
However, when I extract and count unique scene IDs from the official annotation file annotations_v1-3.json (available at https://huggingface.co/datasets/cywan/StreamVLN-Trajectory-Data), I get 61 unique scenes instead of 60.
I've attached the code I used for counting below for your reference:

import json
from collections import defaultdict

1. 读取 JSON 文件

json_path = "annotations_v1-3.json"
with open(json_path, "r", encoding="utf-8") as f:
data = json.load(f)

2. 提取场景 ID 并统计每个场景的 episode 数量

scene_to_episode = defaultdict(int)

for item in data:
video_path = item.get("video", "")
if video_path.startswith("images/"):
rest = video_path[len("images/"):]
scene_id = rest.split("r2r")[0]
scene_to_episode[scene_id] += 1

3. 按场景ID排序并输出所有结果

print(f"✅ 总 episode 数:{len(data)}")
print(f"✅ 唯一场景数:{len(scene_to_episode)}")
print("\n📊 所有场景包含的 episode 数量:")

按场景ID字典序排序输出

for scene_id in sorted(scene_to_episode.keys()):
print(f" - {scene_id}: {scene_to_episode[scene_id]} 个 episode")

4. 可选:导出到文件方便查看

with open("scene_episode_count.txt", "w", encoding="utf-8") as f:
f.write(f"总 episode 数:{len(data)}\n")
f.write(f"唯一场景数:{len(scene_to_episode)}\n\n")
for scene_id in sorted(scene_to_episode.keys()):
f.write(f"{scene_id}: {scene_to_episode[scene_id]} 个 episode\n")
print("\n✅ 已将完整统计结果导出到 scene_episode_count.txt")

总 episode 数:10819
唯一场景数:61

17DRP5sb8fy: 75 个 episode
1LXtFkjw3qL: 162 个 episode
1pXnuDYAj8r: 256 个 episode
29hnd4uzFmX: 192 个 episode
2n8kARJN3HM: 268 个 episode
5LpN3gDmAk7: 243 个 episode
5q7pvUzZiYa: 150 个 episode
759xd9YjKW5: 210 个 episode
7y3sRwLe3Va: 196 个 episode
82sE5b5pLXE: 195 个 episode
8WUmhLawc2A: 279 个 episode
B6ByNegPMKs: 252 个 episode
D7G3Y4RVNrH: 34 个 episode
D7N2EKCX4Sj: 201 个 episode
E9uDoFAP3SH: 265 个 episode
EDJbREhghzL: 249 个 episode
GdvgFV5R1Z5: 21 个 episode
HxpKQynjfin: 15 个 episode
JF19kD82Mey: 63 个 episode
JeFG25nYj2p: 279 个 episode
JmbYfDe2QKZ: 210 个 episode
PX4nDJXEHrG: 240 个 episode
Pm6F8kyY3z2: 24 个 episode
PuKPg4mmafe: 57 个 episode
S9hNv5qa7GM: 246 个 episode
SN83YJsR3w2: 232 个 episode
ULsKaCPVFJR: 234 个 episode
Uxmj2M2itWa: 228 个 episode
V2XKFyX4ASd: 198 个 episode
VFuaQ6m2Qom: 150 个 episode
VLzqgDo317F: 162 个 episode
VVfe2KiqLaN: 105 个 episode
Vvot9Ly1tCj: 276 个 episode
VzqfbhrpDEA: 237 个 episode
XcA2TqTSSAj: 18 个 episode
YmJkqBEsHnH: 9 个 episode
ZMojNkEp431: 225 个 episode
aayBHfsNo7d: 145 个 episode
ac26ZMwG7aT: 273 个 episode
b8cTxDM8gDG: 144 个 episode
cV4RVeZvu5T: 147 个 episode
dhjEzFoUFzH: 33 个 episode
e9zR4mvMWw7: 141 个 episode
gTV8FGcVJC9: 202 个 episode
gZ6f7yhEvPG: 3 个 episode
i5noydFURQK: 177 个 episode
jh4fc5c5qoQ: 147 个 episode
kEZ7cmS4wCh: 196 个 episode
mJXqzFtmKg4: 267 个 episode
p5wJjkQkbXX: 202 个 episode
pRbA3pwrgk9: 135 个 episode
qoiz87JEwZ2: 174 个 episode
r1Q1Z4BcV1o: 222 个 episode
r47D5H71a5s: 279 个 episode
rPc6DW4iMge: 231 个 episode
s8pcmisQ38h: 84 个 episode
sKLMLpTHeUy: 237 个 episode
sT4fr6TAbpF: 252 个 episode
uNb9QFRL6hY: 159 个 episode
ur6pFq6Qu1A: 279 个 episode
vyrNrziPKCB: 234 个 episode

Could you please clarify if this is a typo in the paper?
Thank you very much for your excellent work and generous sharing!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions