-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi authors,
In section 3.4 of your paper, it mentions:
"Vision-Language Action Data. We collect navigation-specific training data using the Habitat simulator across multiple public VLN datasets. Specifically, we collect 450K samples (video clips) from 60 Matterport3D [25] (MP3D) environments, sourced from R2R [7], R2R-EnvDrop [26] and RxR [8]."
However, when I extract and count unique scene IDs from the official annotation file annotations_v1-3.json (available at https://huggingface.co/datasets/cywan/StreamVLN-Trajectory-Data), I get 61 unique scenes instead of 60.
I've attached the code I used for counting below for your reference:
import json
from collections import defaultdict
1. 读取 JSON 文件
json_path = "annotations_v1-3.json"
with open(json_path, "r", encoding="utf-8") as f:
data = json.load(f)
2. 提取场景 ID 并统计每个场景的 episode 数量
scene_to_episode = defaultdict(int)
for item in data:
video_path = item.get("video", "")
if video_path.startswith("images/"):
rest = video_path[len("images/"):]
scene_id = rest.split("r2r")[0]
scene_to_episode[scene_id] += 1
3. 按场景ID排序并输出所有结果
print(f"✅ 总 episode 数:{len(data)}")
print(f"✅ 唯一场景数:{len(scene_to_episode)}")
print("\n📊 所有场景包含的 episode 数量:")
按场景ID字典序排序输出
for scene_id in sorted(scene_to_episode.keys()):
print(f" - {scene_id}: {scene_to_episode[scene_id]} 个 episode")
4. 可选:导出到文件方便查看
with open("scene_episode_count.txt", "w", encoding="utf-8") as f:
f.write(f"总 episode 数:{len(data)}\n")
f.write(f"唯一场景数:{len(scene_to_episode)}\n\n")
for scene_id in sorted(scene_to_episode.keys()):
f.write(f"{scene_id}: {scene_to_episode[scene_id]} 个 episode\n")
print("\n✅ 已将完整统计结果导出到 scene_episode_count.txt")
总 episode 数:10819
唯一场景数:61
17DRP5sb8fy: 75 个 episode
1LXtFkjw3qL: 162 个 episode
1pXnuDYAj8r: 256 个 episode
29hnd4uzFmX: 192 个 episode
2n8kARJN3HM: 268 个 episode
5LpN3gDmAk7: 243 个 episode
5q7pvUzZiYa: 150 个 episode
759xd9YjKW5: 210 个 episode
7y3sRwLe3Va: 196 个 episode
82sE5b5pLXE: 195 个 episode
8WUmhLawc2A: 279 个 episode
B6ByNegPMKs: 252 个 episode
D7G3Y4RVNrH: 34 个 episode
D7N2EKCX4Sj: 201 个 episode
E9uDoFAP3SH: 265 个 episode
EDJbREhghzL: 249 个 episode
GdvgFV5R1Z5: 21 个 episode
HxpKQynjfin: 15 个 episode
JF19kD82Mey: 63 个 episode
JeFG25nYj2p: 279 个 episode
JmbYfDe2QKZ: 210 个 episode
PX4nDJXEHrG: 240 个 episode
Pm6F8kyY3z2: 24 个 episode
PuKPg4mmafe: 57 个 episode
S9hNv5qa7GM: 246 个 episode
SN83YJsR3w2: 232 个 episode
ULsKaCPVFJR: 234 个 episode
Uxmj2M2itWa: 228 个 episode
V2XKFyX4ASd: 198 个 episode
VFuaQ6m2Qom: 150 个 episode
VLzqgDo317F: 162 个 episode
VVfe2KiqLaN: 105 个 episode
Vvot9Ly1tCj: 276 个 episode
VzqfbhrpDEA: 237 个 episode
XcA2TqTSSAj: 18 个 episode
YmJkqBEsHnH: 9 个 episode
ZMojNkEp431: 225 个 episode
aayBHfsNo7d: 145 个 episode
ac26ZMwG7aT: 273 个 episode
b8cTxDM8gDG: 144 个 episode
cV4RVeZvu5T: 147 个 episode
dhjEzFoUFzH: 33 个 episode
e9zR4mvMWw7: 141 个 episode
gTV8FGcVJC9: 202 个 episode
gZ6f7yhEvPG: 3 个 episode
i5noydFURQK: 177 个 episode
jh4fc5c5qoQ: 147 个 episode
kEZ7cmS4wCh: 196 个 episode
mJXqzFtmKg4: 267 个 episode
p5wJjkQkbXX: 202 个 episode
pRbA3pwrgk9: 135 个 episode
qoiz87JEwZ2: 174 个 episode
r1Q1Z4BcV1o: 222 个 episode
r47D5H71a5s: 279 个 episode
rPc6DW4iMge: 231 个 episode
s8pcmisQ38h: 84 个 episode
sKLMLpTHeUy: 237 个 episode
sT4fr6TAbpF: 252 个 episode
uNb9QFRL6hY: 159 个 episode
ur6pFq6Qu1A: 279 个 episode
vyrNrziPKCB: 234 个 episode
Could you please clarify if this is a typo in the paper?
Thank you very much for your excellent work and generous sharing!