Skip to content

Support for non-blocking and best-effort zero-copy io::copy #202

@the8472

Description

@the8472

Proposal

Problem statement

It would be useful to have a version of io::copy that can use:

  • splice or sendfile, some of the existing optimizations have to be rolled back in rust#108283
  • nonblocking IO

Motivation, use-cases

Solution sketches

An API specific to file descriptors

pub fn os::unix::os_copy(buffer: &mut BorrowedBuf, source: &impl AsFd, sink: &impl AsFd) -> io::Result<u64>

It is less generic than io::copy but makes it explicit it only operates on file-like types and may need an intermediate buffer to hold data across multiple invocations when doing non-blocking IO.

Unclear: Whether it should return an error when offload isn't possible or silently fallback to io::copy.

Downsides:

  • covering windows would be more complicated since it distinguishes between Handle and Socket
  • requires cfg()s
  • can't be used by code generic over Read/Write types, e.g. tar-rs

Lean on specialization

pub fn io::zero_copy(source: &mut impl Read, sink: &mut impl Write)  -> io::Result<u64>

This is essentially the same as today's io::copy does but with altered guarantees

  • if the caller passes a BufWriter then any read-but-unwritten bytes will be held in the bufwriter when a WouldBlock occurs. Otherwise the bytes will be dropped
  • changes made to source after zero_copy returns may become visible in sink, as is the case when using sendfile or splice

Downsides:

  • API guarantees strongly rely on specialization
  • the non-blocking case might be a footgun if someone tries to pass a BufWriter as &dyn Write where the specialization won't be able to see it and thus end up dropping bytes

Hybrid of the above

Make the buffer an explicit argument for non-blocking IO but use best-effort specialization for the offloading aspects.

Encapsulate the copy operation in a struct/builder

Rough sketch:

struct Copier<'a, R, W> {
   // a bunch of enums
}

impl<'a, R, W> Copier<'a, R, W> where R: Read, W: Write {
   /// On errors an internal buffer will be allocated if none is provided
   fn buffer(&mut self, buf: &'a mut BorrowedBuf) {}

   fn source(&mut self, src: R) {}
   fn sink(&mut self, sink: W) {}

   /// Runs until first error, can be resumed with a later call
   /// Does not ignore wouldblock or interrupted errors
   fn copy_chunk() -> Result<u64> { todo!() }

   fn total() -> u64 { todo!()  }
}


#[cfg(unix)] // impl for brevity, should be an extension trait. 
impl<'a, R, W> Copier<'a, R, W> where R: Read + AsFd {
   fn fd_source(&mut self, src: &'a R) {}
}

#[cfg(unix)]
impl<'a, R, W> Copier<'a, R, W> where W: Write + AsFd {
   fn fd_sink(&mut self, sink: &'a W) {}
}

Under the hood it could still try to use specialization if the platform-specific APIs aren't used.

Any of the above, but N pairs instead of 1

When copying many small files and the like it can be beneficial to run them in batches. It's not a full-fledged async runtime that could add work incrementally as other work items complete but still more efficient than doing one-at-a-time.

Under the hood we could use polling or io_uring where appropriate.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions