Skip to content

Generics usage by decoders increase code size and compilation time #2472

@fintelia

Description

@fintelia

This program takes 17 seconds to compile and produces a 2.4 MB binary with 1.2 MB of that being instructions in the .text segment:

use std::{hint::black_box, io::{BufRead, Cursor, Seek}};
use image::ImageReader;

fn decode<R: BufRead + Seek>(slice: R) {
    black_box(ImageReader::new(slice).decode()).unwrap();
}

fn main() {
    let mut f = Cursor::new(black_box(&[]));
    decode(&mut f);
}

Whereas adding more instantiations of ImageReader with different generic arguments drastically increases those numbers. Adding ten more copies increases compile time to 22 seconds and the resulting 4.8 MB binary contains a 3 MB .text section:

// ...

fn main() {
    let mut f = Cursor::new(black_box(&[]));
    decode(&mut f);
    decode(&mut &mut f);
    decode(&mut &mut &mut f);
    decode(&mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut &mut &mut f);
    decode(&mut &mut &mut &mut &mut &mut &mut &mut &mut &mut &mut f);
}

What's going on here is that because ImageReader and each of the underlying decoders are parameterized by the reader type, when we instantiate them using 11 different reader types, that actually causes rustc to compile 11 copies of our PNG decoder, 11 copies of our JPEG decoder, 11 copies of TIFF, and WebP, and so on. Then all those copies get fed into LLVM for codegen and optimization and the resulting compile times and output sizes speak for themselves.

What should we do?

I think this starts to call into question our strategy of parameterize decoders by std::io::Read (and sometimes Seek or BufRead). Any methods that aren't generic will get compiled only once, and in parallel as the specific format crate is being compiled. But generic code can get compiled many times.

There's been some talk about trying to move towards a "sans io" style where an encoder or decoder would operate on byte slices rather than directly pulling new input data from a reader or pushing output data into a writer. I think that's a promising point to explore. Even if we didn't do a full switchover, just moving some parts of each decoder to that style could significantly cut down on the amount of code that gets replicated.

I'd also be open to other thoughts / ideas or if there's some aspect I'm overlooking. We will have to be mindful that compile time improvements don't lead to runtime regressions if we no longer specialize decoders to the specific reader type they're working with.

PRs #2468 and #2470 changes encoders and decoders, respectively, from being fully compiled when no downstream code uses them. That's a start, but doesn't address the underlying concern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind: APImissing or awkward public APIs, maintainer choice

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions