Question about calculating the total dataset size from Figure 2

Thank you very much for your excellent work!

In Figure 2: Co-Training Data Recipe of StreamVLN, we would like to clarify the total dataset size and the corresponding data amounts for each split.
Assuming the DAgger data size is accurately 240K and accounts for 16% of the total data, we calculate the total dataset size (X) as follows:

16% × X = 240K
X = 240K / 0.16 = 1500K

Based on this total size of 1500K, we further compute the data amounts for other splits:
- MP3D (31%): 1500K × 31% = 465K
- HM3D (20%): 1500K × 20% = 300K
- VQA (17%): 1500K × 17% = 255K
- MMC4 (16%): 1500K × 16% = 240K
- VLA (67%): 1500K × 67% = 1005K
- General Multi-modal (33%): 1500K × 33% = 495K

Could you please confirm if these calculations align with the actual dataset configuration? Thank you so much for your time and clarification!
1. Vision-Language Action (VLA) Data
- MP3D: Text states 450K, but calculation (1500K × 31%) gives 465K (discrepancy)
- HM3D: Text states 300K, calculation (1500K × 20%) gives 300K (consistent ✅)
- DAgger: Text states 240K, calculation (1500K × 16%) gives 240K (consistent ✅)
- VLA Total: Text sum is 450K + 300K + 240K = 990K, but calculation (1500K × 67%) gives 1005K (discrepancy)

 2. General Multi-modal Data
- VQA: Text states 248K, but calculation (1500K × 17%) gives 255K (discrepancy)
- MMC4: Text states 230K, but calculation (1500K × 16%) gives 240K (discrepancy)
- General Total: Text sum is 248K + 230K = 478K, but calculation (1500K × 33%) gives 495K (discrepancy)

We suspect these discrepancies come from the rounded percentage values in the pie chart (e.g., 31%, 17%, 16%) not being the exact true ratios. Could you please confirm the precise total dataset size and the exact ratios for each split? Thank you so much for your time and clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about calculating the total dataset size from Figure 2 #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about calculating the total dataset size from Figure 2 #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions