Skip to content

Question about calculating the total dataset size from Figure 2 #83

@followingcode

Description

@followingcode

Thank you very much for your excellent work!

In Figure 2: Co-Training Data Recipe of StreamVLN, we would like to clarify the total dataset size and the corresponding data amounts for each split.
Assuming the DAgger data size is accurately 240K and accounts for 16% of the total data, we calculate the total dataset size (X) as follows:

16% × X = 240K
X = 240K / 0.16 = 1500K

Based on this total size of 1500K, we further compute the data amounts for other splits:

  • MP3D (31%): 1500K × 31% = 465K
  • HM3D (20%): 1500K × 20% = 300K
  • VQA (17%): 1500K × 17% = 255K
  • MMC4 (16%): 1500K × 16% = 240K
  • VLA (67%): 1500K × 67% = 1005K
  • General Multi-modal (33%): 1500K × 33% = 495K

Could you please confirm if these calculations align with the actual dataset configuration? Thank you so much for your time and clarification!

  1. Vision-Language Action (VLA) Data
  • MP3D: Text states 450K, but calculation (1500K × 31%) gives 465K (discrepancy)
  • HM3D: Text states 300K, calculation (1500K × 20%) gives 300K (consistent ✅)
  • DAgger: Text states 240K, calculation (1500K × 16%) gives 240K (consistent ✅)
  • VLA Total: Text sum is 450K + 300K + 240K = 990K, but calculation (1500K × 67%) gives 1005K (discrepancy)
  1. General Multi-modal Data
  • VQA: Text states 248K, but calculation (1500K × 17%) gives 255K (discrepancy)
  • MMC4: Text states 230K, but calculation (1500K × 16%) gives 240K (discrepancy)
  • General Total: Text sum is 248K + 230K = 478K, but calculation (1500K × 33%) gives 495K (discrepancy)

We suspect these discrepancies come from the rounded percentage values in the pie chart (e.g., 31%, 17%, 16%) not being the exact true ratios. Could you please confirm the precise total dataset size and the exact ratios for each split? Thank you so much for your time and clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions