1. Optionally specify `y` and `z` workgroup counts. 2. Distribute 1D workgroup sizes into 2D/3D if they exceed limits. 3. Interpret non-number size token as indirect arg name. Map `name`->`offset`. Add indirect to relevant memory barriers.