Generics usage by decoders increase code size and compilation time

This program takes 17 seconds to compile and produces a 2.4 MB binary with 1.2 MB of that being instructions in the `.text` segment:

```rust 
use std::{hint::black_box, io::{BufRead, Cursor, Seek}};
use image::ImageReader;

fn decode<R: BufRead + Seek>(slice: R) {
    black_box(ImageReader::new(slice).decode()).unwrap();
}

fn main() {
    let mut f = Cursor::new(black_box(&[]));
    decode(&mut f);
}
```

Whereas adding more instantiations of `ImageReader` with different generic arguments drastically increases those numbers. Adding ten more copies increases compile time to 22 seconds and the resulting 4.8 MB binary contains a 3 MB `.text` section:

```rust
// ...

fn main() {
    let mut f = Cursor::new(black_box(&[]));
    decode(&mut f);
    decode(&mut &mut f);
    decode(&mut &mut &mut f);
    decode(&mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut &mut &mut &mut f);
}
```

What's going on here is that because `ImageReader` and each of the underlying decoders are parameterized by the reader type, when we instantiate them using 11 different reader types, that actually causes rustc to compile 11 copies of our PNG decoder, 11 copies of our JPEG decoder, 11 copies of TIFF, and WebP, and so on. Then all those copies get fed into LLVM for codegen and optimization and the resulting compile times and output sizes speak for themselves.

# What should we do?

I think this starts to call into question our strategy of parameterize decoders by `std::io::Read` (and sometimes `Seek` or `BufRead`). Any methods that aren't generic will get compiled only once, and in parallel as the specific format crate is being compiled. But generic code can get compiled many times.

There's been some talk about trying to move towards a "sans io" style where an encoder or decoder would operate on byte slices rather than directly pulling new input data from a reader or pushing output data into a writer. I think that's a promising point to explore. Even if we didn't do a full switchover, just moving some parts of each decoder to that style could significantly cut down on the amount of code that gets replicated.

I'd also be open to other thoughts / ideas or if there's some aspect I'm overlooking. We will have to be mindful that compile time improvements don't lead to runtime regressions if we no longer specialize decoders to the specific reader type they're working with.

PRs #2468 and #2470 changes encoders and decoders, respectively, from being fully compiled when no downstream code uses them. That's a start, but doesn't address the underlying concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generics usage by decoders increase code size and compilation time #2472

What should we do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generics usage by decoders increase code size and compilation time #2472

Description

What should we do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions