diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 4aaaf22..0b236ff 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -8,9 +8,8 @@ OneIO is a Rust library providing unified IO operations for reading and writing ### Before Committing - Always Run: 1. **Format code**: `cargo fmt` 2. **Run linter**: `cargo clippy --all-features` and fix all warnings -3. **Update README if lib.rs docs changed**: `cargo readme > README.md` -4. **Update CHANGELOG.md**: Add entries under `[Unreleased]` section -5. **Run tests**: `cargo test --all-features` +3. **Update CHANGELOG.md**: Add entries under `[Unreleased]` section +4. **Run tests**: `cargo test --all-features` ### Formatting Rules - Always run `cargo fmt` before completing any task @@ -19,10 +18,8 @@ OneIO is a Rust library providing unified IO operations for reading and writing - Follow Rust standard formatting conventions ### Documentation Requirements -- When modifying `src/lib.rs` documentation, always regenerate README: - ```bash - cargo readme > README.md - ``` +- Keep lib.rs documentation concise and API-focused (for docs.rs) +- README.md is maintained separately with more detailed examples - Keep documentation examples up-to-date - Add doc comments for all public APIs - Include usage examples in module-level documentation @@ -135,7 +132,7 @@ cargo test --features http,gz,bz 1. Add feature flag to `Cargo.toml` 2. Implement feature-gated code with `#[cfg(feature = "...")]` 3. Add tests for the feature -4. Document in lib.rs and regenerate README +4. Document in lib.rs (API docs) and README.md (user guide) 5. Update CHANGELOG.md 6. Add example if applicable @@ -179,4 +176,4 @@ cargo test --features http,gz,bz - **Always run cargo fmt and cargo clippy before committing** - **Always update CHANGELOG.md with changes** - **No emojis in commits or PRs** -- **Regenerate README.md when lib.rs docs change** +- **Maintain both lib.rs (API docs) and README.md (user guide) separately** diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index fea6df1..4d541b7 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -17,9 +17,6 @@ jobs: - name: Run format check run: cargo fmt --check - - name: Run cargo-readme check - run: cargo install cargo-readme && cargo readme > TMP_README.md && diff -b TMP_README.md README.md - create-release: needs: release-format-check runs-on: ubuntu-latest diff --git a/CHANGELOG.md b/CHANGELOG.md index c52d370..e643a6c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,16 +4,58 @@ All notable changes to this project will be documented in this file. ## [Unreleased] +### Breaking changes +- `OneIoError` is now `#[non_exhaustive]`; `match` expressions without a wildcard `_` arm will fail to compile +- `OneIoBuilder::header()` now accepts typed `HeaderName`/`HeaderValue` (infallible) instead of `(K, V) -> Result` +- `OneIoBuilder::user_agent()` now accepts a typed `HeaderValue` (infallible) instead of `V -> Result` +- `oneio::download()` no longer accepts an `Option` parameter +- `oneio::remote` module is now `pub(crate)`; `create_client_with_headers` is deprecated (use `OneIo::builder().header_str()`) +- `ProgressReader` and `ProgressCallback` are no longer part of the public API + ### Changed +- Flattened module layout: `src/oneio/` sub-directory removed; all modules are now at `src/` level +- `OneIo` and `OneIoBuilder` are now the primary API surface; free-standing functions delegate to a shared default client +- Compression detection strips URL query parameters and fragments before reading the file extension +- `download_with_retry()` uses exponential backoff between retry attempts (100ms × 2^attempt, capped at 6400ms) - Stream cache writes to disk via `std::io::copy` instead of buffering the full payload in memory - `download_async()` now preserves raw bytes, matching `download()` - Default blocking HTTP clients are reused across reads and content-length probes +- Stateless read and download helpers now delegate to a reusable `OneIo` client internally - S3 status failures now use structured errors instead of string parsing - S3 readers now stream data through a bounded channel instead of materializing the full object in memory ### Added +- `OneIoBuilder::header_str(name, value)` — string convenience for adding headers (panics on invalid input, matching reqwest convention) +- `OneIoBuilder::configure_http(f)` — escape hatch for setting any `reqwest::blocking::ClientBuilder` option +- `OneIoBuilder::timeout()`, `connect_timeout()` — request and connect timeouts +- `OneIoBuilder::proxy()`, `no_proxy()` — proxy configuration +- `OneIoBuilder::redirect()` — redirect policy +- `OneIoBuilder::add_root_certificate_pem()`, `add_root_certificate_der()` — load CA certs from raw bytes +- `OneIo::get_reader_with_type(path, compression)` — explicit compression override, useful for URLs with query parameters +- `OneIo::from_client(client)` — construct a `OneIo` from an existing `reqwest::blocking::Client` +- `OneIoError::NetworkWithContext` — network errors now carry the URL that failed +- `OneIoError::InvalidHeader`, `OneIoError::InvalidCertificate` — specific error variants for header and certificate construction failures +- `ONEIO_CA_BUNDLE` environment variable — path to a PEM file added to the HTTP trust store on startup +- `get_cache_reader()` free-standing shortcut kept at crate root for convenience - Added `bzip2_decompress` benchmark coverage - Added a benchmark helper script for comparing gzip backend feature flags and bz2 decompression +- Added reusable `OneIo` and `OneIoBuilder` APIs for sharing headers and TLS certificate configuration across requests + +### CLI +- Added `-H`/`--header` flag for custom HTTP headers (`"Name: Value"` or `"Name:Value"`), can be repeated +- Added `--compression` flag to override compression detection (gz, bz2, lz4, xz, zst); no effect with `--download` +- Added progress bar for `--download` and `s3 download`, shown when stderr is a terminal; uses spinner when file size is unknown +- Added `s3 download [--outfile]` subcommand +- Fixed S3 upload syntax: `oneio s3 upload ` (local file is now the first positional arg under the subcommand) +- Terminal detection uses `std::io::IsTerminal` from the standard library; no extra dependency needed +- Added `indicatif` to the `cli` feature for progress bars + +### Bug fixes +- LZ4 compressed writes were silently truncated: `lz4::Encoder` has no `Drop` impl and requires an explicit `finish()` call to write the end-of-stream marker. Fixed with a `Lz4Writer` wrapper that calls `finish()` on drop. + +### Documentation +- `lib.rs` docstring documents the `native-tls` feature as the fix for Cloudflare WARP and corporate proxy environments +- `ONEIO_ACCEPT_INVALID_CERTS` and `ONEIO_CA_BUNDLE` environment variables documented at crate root ## v0.20.1 -- 2025-12-18 diff --git a/Cargo.toml b/Cargo.toml index 7ee60e5..2594ce0 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -53,6 +53,7 @@ rust-s3 = { version = "0.37", optional = true, default-features = false, feature # feature: cli clap = { version = "4.4", features = ["derive"], optional = true } tracing = { version = "0.1", optional = true } +indicatif = { version = "0.18", optional = true } # feature: async (Phase 3) tokio = { version = "1.0", features = ["rt", "rt-multi-thread", "io-util", "fs"], optional = true } @@ -94,7 +95,7 @@ json = ["serde", "serde_json"] digest = ["ring", "hex"] # CLI tool (includes common features) -cli = ["clap", "tracing", "gz", "bz", "lz", "xz", "http", "s3", "digest"] +cli = ["clap", "tracing", "indicatif", "gz", "bz", "lz", "xz", "http", "s3", "digest"] # Advanced: TLS selection (only if explicitly needed) native-tls = [ diff --git a/README.md b/README.md index 10d94da..cb694cd 100644 --- a/README.md +++ b/README.md @@ -8,26 +8,21 @@ OneIO is a Rust library providing unified IO operations for reading and writing compressed files from local and remote sources with both synchronous and asynchronous support. -### Quick Start +## Quick Start ```toml oneio = "0.20" # Default: gz, bz, https ``` -### Feature Selection Guide +## Feature Selection Guide -#### Common Use Cases +### Common Use Cases **Local files only:** ```toml oneio = { version = "0.20", default-features = false, features = ["gz", "bz"] } ``` -**HTTP only (no HTTPS)**: -```toml -oneio = { version = "0.20", default-features = false, features = ["http", "gz"] } -``` - **HTTPS with default rustls**: ```toml oneio = { version = "0.20", default-features = false, features = ["https", "gz"] } @@ -38,7 +33,7 @@ oneio = { version = "0.20", default-features = false, features = ["https", "gz"] # With rustls oneio = { version = "0.20", default-features = false, features = ["http", "rustls", "gz"] } -# With native-tls +# With native-tls (recommended for corporate proxies/VPNs) oneio = { version = "0.20", default-features = false, features = ["http", "native-tls", "gz"] } ``` @@ -52,7 +47,7 @@ oneio = { version = "0.20", default-features = false, features = ["s3", "https", oneio = { version = "0.20", features = ["async"] } ``` -#### Available Features +### Available Features **Compression** (choose only what you need): - `gz` - Gzip via flate2 @@ -68,8 +63,8 @@ oneio = { version = "0.20", features = ["async"] } - `s3` - S3-compatible storage **TLS Backends** (for HTTPS - mutually exclusive): -- `rustls` - Pure Rust TLS (use with `http`). Uses both system certificates and bundled Mozilla certificates for maximum compatibility with corporate VPNs and minimal environments. -- `native-tls` - Platform native TLS (use with `http`) +- `rustls` - Pure Rust TLS (use with `http`). Uses both system certificates and bundled Mozilla certificates for maximum compatibility. +- `native-tls` - Platform native TLS (use with `http`). **Recommended for corporate proxies and VPNs** (Cloudflare WARP, etc.) as it uses the OS trust store. **Additional**: - `async` - Async support (limited to gz, bz, zstd for compression) @@ -77,27 +72,47 @@ oneio = { version = "0.20", features = ["async"] } - `digest` - SHA256 digest calculation - `cli` - Command-line tool -Environment: Set `ONEIO_ACCEPT_INVALID_CERTS=true` to accept invalid certificates. +### Working with Corporate Proxies (Cloudflare WARP, etc.) -**Crypto Provider Initialization**: When using rustls features (`https`, `s3`, `ftp`), oneio -automatically initializes the crypto provider (AWS-LC or ring) on first use. You can also -initialize it explicitly at startup using [`crypto::ensure_default_provider()`] for better -control over error handling. +If you're behind a corporate proxy or VPN like Cloudflare WARP that uses custom TLS certificates: -### Usages +```toml +[dependencies] +oneio = { version = "0.20", default-features = false, features = ["http", "native-tls", "gz"] } +``` -#### Reading Files +The `native-tls` feature uses your operating system's TLS stack with its trust store, which includes custom corporate certificates. This works for both HTTP/HTTPS and S3 operations. -Read all content into a string: +Alternatively, you can add custom CA certificates: ```rust -use oneio; +use oneio::OneIo; + +let oneio = OneIo::builder() + .add_root_certificate_pem(&std::fs::read("company-ca.pem")?)? + .build()?; +``` + +Or set the `ONEIO_CA_BUNDLE` environment variable: +```bash +export ONEIO_CA_BUNDLE=/path/to/company-ca.pem +``` + +**Environment Variables:** +- `ONEIO_ACCEPT_INVALID_CERTS=true` - Accept invalid TLS certificates (insecure, for development only) +- `ONEIO_CA_BUNDLE=/path/to/ca.pem` - Add custom CA certificate to trust store + +## Library Usage + +### Basic Reading and Writing -const TEST_TEXT: &str = "OneIO test file.\nThis is a test."; +Read all content into a string (works with compression and remote files automatically): + +```rust +use oneio; -// Works with compression and remote files automatically let content = oneio::read_to_string("https://spaces.bgpkit.org/oneio/test_data.txt.gz")?; -assert_eq!(content.trim(), TEST_TEXT); +println!("{}", content); ``` Read line by line: @@ -109,9 +124,9 @@ let lines = oneio::read_lines("https://spaces.bgpkit.org/oneio/test_data.txt.gz" .map(|line| line.unwrap()) .collect::>(); -assert_eq!(lines.len(), 2); -assert_eq!(lines[0], "OneIO test file."); -assert_eq!(lines[1], "This is a test."); +for line in lines { + println!("{}", line); +} ``` Get a reader for streaming: @@ -125,8 +140,6 @@ let mut buffer = Vec::new(); reader.read_to_end(&mut buffer)?; ``` -#### Writing Files - Write with automatic compression: ```rust @@ -139,45 +152,85 @@ drop(writer); // Important: close the writer // Read it back let content = oneio::read_to_string("output.txt.gz")?; -assert_eq!(content, "Hello, compressed world!"); ``` -#### Remote Files with Custom Headers +### Reusable OneIo Clients + +The `OneIo` client allows you to configure headers, TLS certificates, timeouts, and other options once, then reuse the configuration across multiple operations: ```rust -use oneio; +use oneio::OneIo; +use reqwest::header::{HeaderName, HeaderValue}; + +// Build a reusable client with custom headers and certificates +let oneio = OneIo::builder() + .header_str("Authorization", "Bearer TOKEN") + .add_root_certificate_pem(&std::fs::read("company-ca.pem")?)? + .timeout(std::time::Duration::from_secs(30)) + .connect_timeout(std::time::Duration::from_secs(10)) + .build()?; + +// Reuse the same configuration for multiple requests +let content1 = oneio.read_to_string("https://api.example.com/data1.json")?; +let content2 = oneio.read_to_string("https://api.example.com/data2.json")?; +``` -let client = oneio::create_client_with_headers([("Authorization", "Bearer TOKEN")])?; -let mut reader = oneio::get_http_reader( - "https://api.example.com/protected/data.json.gz", - Some(client) -)?; +**Builder Methods:** +- `.header(name, value)` - Add a typed header (infallible, uses `HeaderName` and `HeaderValue`) +- `.header_str(name, value)` - Add a string header (panics on invalid input) +- `.user_agent(value)` - Set User-Agent header +- `.add_root_certificate_pem(pem)` - Add custom CA certificate (PEM format) +- `.add_root_certificate_der(der)` - Add custom CA certificate (DER format) +- `.danger_accept_invalid_certs(true)` - Accept invalid certificates +- `.timeout(duration)` - Set request timeout +- `.connect_timeout(duration)` - Set connection timeout +- `.proxy(proxy)` - Set HTTP proxy +- `.no_proxy()` - Disable system proxy +- `.redirect(policy)` - Set redirect policy +- `.configure_http(f)` - Escape hatch for direct reqwest configuration -let content = std::io::read_to_string(&mut reader)?; -println!("{}", content); +### Compression Override + +For URLs with query parameters or non-standard extensions, use explicit compression type: + +```rust +use oneio::OneIo; + +let oneio = OneIo::new()?; + +// URL has query params, so we specify compression explicitly +let reader = oneio.get_reader_with_type( + "https://api.example.com/data?format=gzip", + "gz" +)?; ``` -#### Progress Tracking +### Progress Tracking + Track download/read progress with callbacks: ```rust -use oneio; +use oneio::OneIo; + +let oneio = OneIo::new()?; -let (mut reader, total_size) = oneio::get_reader_with_progress( +// Callback receives (bytes_read, total_bytes). +// total_bytes is 0 when the server does not provide a Content-Length. +// The returned Option is Some(total) when the size was known upfront. +let (mut reader, total_size) = oneio.get_reader_with_progress( "https://example.com/largefile.gz", |bytes_read, total_bytes| { - match total_bytes { - Some(total) => { - let percent = (bytes_read as f64 / total as f64) * 100.0; - println!("Progress: {:.1}%", percent); - } - None => println!("Downloaded: {} bytes", bytes_read), + if total_bytes > 0 { + let percent = (bytes_read as f64 / total_bytes as f64) * 100.0; + println!("Progress: {:.1}%", percent); + } else { + println!("Downloaded: {} bytes", bytes_read); } } )?; ``` -#### Async Support (Feature: `async`) +### Async Support (Feature: `async`) ```rust use oneio; @@ -191,54 +244,12 @@ async fn main() -> Result<(), Box> { "local_data.csv.gz" ).await?; - // download_async preserves the remote bytes. - Ok(()) } ``` Note: Async compression is limited to gz, bz, zstd. LZ4/XZ return `NotSupported`. - -### Supported Formats - -#### Compression Detection - -OneIO detects compression algorithm by the file extensions: - -- **Gzip**: `.gz`, `.gzip` -- **Bzip2**: `.bz`, `.bz2` -- **LZ4**: `.lz4`, `.lz` -- **XZ**: `.xz`, `.xz2` -- **Zstandard**: `.zst`, `.zstd` - -#### Protocol Support -- **Local files**: `/path/to/file.txt` -- **HTTP/HTTPS**: `https://example.com/file.txt.gz` -- **FTP**: `ftp://ftp.example.com/file.txt` (requires `ftp` feature) -- **S3**: `s3://bucket/path/file.txt` (requires `s3` feature) - -### Command Line Tool - -Install the CLI tool: - -```bash -cargo install oneio --features cli -``` - -Basic usage: - -```bash -# Read and print a remote compressed file -oneio https://example.com/data.txt.gz - -# Download a file -oneio -d https://example.com/largefile.bz2 - -# Pipe to other tools -oneio https://api.example.com/data.json.gz | jq '.results | length' -``` - ### S3 Operations (Feature: `s3`) ```rust @@ -248,8 +259,9 @@ use oneio::s3::*; s3_upload("my-bucket", "path/to/file.txt", "local/file.txt")?; s3_download("my-bucket", "path/to/file.txt", "downloaded.txt")?; -// Read S3 directly -let content = oneio::read_to_string("s3://my-bucket/path/to/file.txt")?; +// Read S3 directly using OneIO +let oneio = oneio::OneIo::new()?; +let content = oneio.read_to_string("s3://my-bucket/path/to/file.txt")?; // Check existence and get metadata if s3_exists("my-bucket", "path/to/file.txt")? { @@ -261,11 +273,37 @@ if s3_exists("my-bucket", "path/to/file.txt")? { let objects = s3_list("my-bucket", "path/", Some("/".to_string()), false)?; ``` +Required environment variables for S3: +- `AWS_ACCESS_KEY_ID` +- `AWS_SECRET_ACCESS_KEY` +- `AWS_REGION` (use "auto" for Cloudflare R2) +- `AWS_ENDPOINT` + +### Error Handling + +OneIO uses a simplified error enum with `#[non_exhaustive]` for forward compatibility: + +```rust +use oneio::OneIoError; + +match oneio::get_reader("file.txt") { + Ok(reader) => { /* use reader */ }, + Err(OneIoError::Io(e)) => { /* filesystem error */ }, + Err(OneIoError::Network(e)) => { /* network error */ }, + Err(OneIoError::NetworkWithContext { source, url }) => { + // Network error with URL context for debugging + eprintln!("Failed to fetch {}: {}", url, source); + } + Err(OneIoError::Status { service, code }) => { /* remote status error */ }, + Err(OneIoError::InvalidCertificate(msg)) => { /* TLS cert error */ }, + Err(OneIoError::NotSupported(msg)) => { /* feature not compiled */ }, + _ => { /* handle future error variants */ } +} +``` + ### Crypto Provider Initialization (Rustls) -When using HTTPS, S3, or FTP features with rustls, oneio automatically initializes -a crypto provider (AWS-LC or ring) on first use. For more control, you can initialize -it explicitly at startup: +When using HTTPS, S3, or FTP features with rustls, oneio automatically initializes a crypto provider (AWS-LC or ring) on first use. For more control, initialize it explicitly: ```rust use oneio; @@ -281,27 +319,155 @@ fn main() -> Result<(), Box> { } ``` -This is particularly useful in libraries or applications that want to: -- Handle initialization errors early -- Control when the provider is set up -- Make the dependency on crypto providers explicit +## Command Line Tool -### Error Handling +Install the CLI tool: -Three error types in v0.20: +```bash +cargo install oneio --features cli +``` -```rust -use oneio::OneIoError; +### Basic Usage -match oneio::get_reader("file.txt") { - Ok(reader) => { /* use reader */ }, - Err(OneIoError::Io(e)) => { /* filesystem error */ }, - Err(OneIoError::Network(e)) => { /* network error */ }, - Err(OneIoError::Status { service, code }) => { /* remote status error */ }, - Err(OneIoError::NotSupported(msg)) => { /* feature not compiled */ }, -} +**Read and print a remote compressed file:** +```bash +$ oneio https://spaces.bgpkit.org/oneio/test_data.txt.gz +OneIO test file. +This is a test. +``` + +**Read local compressed file:** +```bash +$ oneio tests/test_data.txt.gz +OneIO test file. +This is a test. +``` + +**Get file statistics:** +```bash +$ oneio tests/test_data.txt --stats +lines: 2 +chars: 31 +``` + +### Download with Progress Bar + +Download a file with automatic progress bar (shown when stderr is a terminal): +```bash +$ oneio -d https://example.com/largefile.bz2 +downloaded to largefile.bz2 +``` + +When stderr is piped or redirected the progress bar is suppressed. + +### Custom HTTP Headers + +Add custom headers for API authentication: +```bash +$ oneio -H "Authorization: Bearer TOKEN" -H "X-Custom-Header: value" https://api.example.com/data.json +``` + +### Compression Override + +For URLs with query parameters where extension detection fails: +```bash +$ oneio --compression gz "https://api.example.com/data?format=gzip" +``` + +### Caching + +Cache remote files locally for repeated reads: +```bash +$ oneio --cache-dir /tmp/cache https://example.com/largefile.gz +# Second read uses cache +$ oneio --cache-dir /tmp/cache https://example.com/largefile.gz +``` + +Force re-download even if cache exists: +```bash +$ oneio --cache-dir /tmp/cache --cache-force https://example.com/largefile.gz +``` + +### S3 Operations + +**Upload file to S3:** +```bash +$ oneio s3 upload local-file.txt my-bucket path/in/s3.txt +uploaded to s3://my-bucket/path/in/s3.txt +``` + +**Download file from S3:** +```bash +$ oneio s3 download my-bucket path/in/s3.txt -o local-file.txt +downloaded s3://my-bucket/path/in/s3.txt to local-file.txt +``` + +**List S3 bucket:** +```bash +$ oneio s3 list my-bucket path/ --delimiter "/" +``` + +**List directories only:** +```bash +$ oneio s3 list my-bucket path/ --dirs ``` +### Generate SHA256 Digest + +```bash +$ oneio digest tests/test_data.txt +a3f5c8e9d2b1... (64 hex characters) +``` + +### CLI Help Output + +``` +$ oneio --help +oneio reads files from local or remote locations with any compression + +Usage: oneio [OPTIONS] [FILE] [COMMAND] + +Commands: + s3 S3-related subcommands + digest Generate SHA256 digest + help Print this message or the given subcommand(s) + +Arguments: + [FILE] file to open, remote or local + +Options: + -d, --download download the file to the current directory + -o, --outfile output file path + --cache-dir cache reading to a specified directory + --cache-force force re-caching if a local cache already exists + --cache-file specify cache file name + -s, --stats read through the file and only print out stats + -H, --header Add HTTP header (format: "Name: Value"), can be repeated + --compression Override compression type (gz, bz2, lz4, xz, zst) + -h, --help Print help + -V, --version Print version +``` + +## Supported Formats + +### Compression Detection + +OneIO detects compression algorithm by the file extensions: + +- **Gzip**: `.gz`, `.gzip`, `.tgz` +- **Bzip2**: `.bz`, `.bz2` +- **LZ4**: `.lz4`, `.lz` +- **XZ**: `.xz`, `.xz2`, `.lzma` +- **Zstandard**: `.zst`, `.zstd` + +For URLs with query parameters, use `--compression` flag or `get_reader_with_type()`. + +### Protocol Support +- **Local files**: `/path/to/file.txt` +- **HTTP/HTTPS**: `https://example.com/file.txt.gz` +- **FTP**: `ftp://ftp.example.com/file.txt` (requires `ftp` feature) +- **S3**: `s3://bucket/path/file.txt` (requires `s3` feature) + ## License MIT diff --git a/README.tpl b/README.tpl deleted file mode 100644 index 5c87995..0000000 --- a/README.tpl +++ /dev/null @@ -1,12 +0,0 @@ -# OneIO - all-in-one IO library for Rust - -[![Rust](https://github.com/bgpkit/oneio/actions/workflows/rust.yml/badge.svg)](https://github.com/bgpkit/oneio/actions/workflows/rust.yml) -[![Crates.io](https://img.shields.io/crates/v/oneio)](https://crates.io/crates/oneio) -[![Docs.rs](https://docs.rs/oneio/badge.svg)](https://docs.rs/oneio) -[![License](https://img.shields.io/crates/l/oneio)](https://raw.githubusercontent.com/bgpkit/oneio/main/LICENSE) - -{{readme}} - -## License - -{{license}} \ No newline at end of file diff --git a/examples/progress_tracking.rs b/examples/progress_tracking.rs index 1dbf58a..1f5ae9b 100644 --- a/examples/progress_tracking.rs +++ b/examples/progress_tracking.rs @@ -40,7 +40,7 @@ fn download_large_file() -> Result<(), Box> { // Get reader with progress tracking let (mut reader, _total_size) = - oneio::get_reader_with_progress(url, move |bytes_read, total_bytes| { + oneio::OneIo::new()?.get_reader_with_progress(url, move |bytes_read, total_bytes| { // Show and set length when we know the total size if total_bytes > 0 { if pb_clone.is_hidden() { diff --git a/src/async_reader.rs b/src/async_reader.rs new file mode 100644 index 0000000..592f043 --- /dev/null +++ b/src/async_reader.rs @@ -0,0 +1,134 @@ +//! Async reader support for OneIO. + +use crate::OneIoError; +#[cfg(feature = "async")] +use futures::StreamExt; +#[cfg(feature = "async")] +use tokio::io::{AsyncRead, AsyncReadExt}; + +/// Gets an async reader for the given file path +/// +/// This is the async version of `get_reader()`. It supports all the same protocols +/// and compression formats as the sync version. +#[cfg(feature = "async")] +pub async fn get_reader_async(path: &str) -> Result, OneIoError> { + let raw_reader = get_async_reader_raw(path).await?; + let file_type = crate::file_extension(path); + get_async_compression_reader(raw_reader, file_type) +} + +/// Reads the entire content of a file asynchronously into a string +#[cfg(feature = "async")] +pub async fn read_to_string_async(path: &str) -> Result { + let mut reader = get_reader_async(path).await?; + let mut content = String::new(); + reader.read_to_string(&mut content).await?; + Ok(content) +} + +/// Downloads a file asynchronously from a URL to a local path +#[cfg(feature = "async")] +pub async fn download_async(url: &str, path: &str) -> Result<(), OneIoError> { + use std::path::Path; + use tokio::fs::File; + use tokio::io::{copy, AsyncWriteExt}; + + if let Some(parent) = Path::new(path).parent() { + if !parent.as_os_str().is_empty() { + tokio::fs::create_dir_all(parent).await?; + } + } + + let mut reader = get_async_reader_raw(url).await?; + let mut file = File::create(path).await?; + copy(&mut reader, &mut file).await?; + file.flush().await?; + Ok(()) +} + +/// Gets a raw async reader for the given path (before compression) +#[cfg(feature = "async")] +async fn get_async_reader_raw(path: &str) -> Result, OneIoError> { + let raw_reader: Box = match crate::get_protocol(path) { + #[cfg(feature = "http")] + Some(protocol) if protocol == "http" || protocol == "https" => { + #[cfg(feature = "rustls")] + crate::crypto::ensure_default_provider()?; + + let response = reqwest::get(path).await?; + let stream = response + .bytes_stream() + .map(|result| result.map_err(std::io::Error::other)); + Box::new(tokio_util::io::StreamReader::new(stream)) + } + #[cfg(feature = "ftp")] + Some(protocol) if protocol == "ftp" => { + return Err(OneIoError::NotSupported( + "FTP async not supported - use sync get_reader() instead".to_string(), + )); + } + #[cfg(feature = "s3")] + Some(protocol) if protocol == "s3" || protocol == "r2" => { + return Err(OneIoError::NotSupported( + "S3 async not supported - use sync get_reader() instead".to_string(), + )); + } + Some(_) => { + return Err(OneIoError::NotSupported(format!( + "Async support not available for protocol in path: {path}" + ))); + } + None => { + // Local file + use tokio::fs::File; + let file = File::open(path).await?; + Box::new(file) + } + }; + Ok(raw_reader) +} + +/// Applies async decompression based on file extension +#[cfg(feature = "async")] +fn get_async_compression_reader( + reader: Box, + file_type: &str, +) -> Result, OneIoError> { + match file_type { + #[cfg(all(feature = "async", feature = "any_gz"))] + "gz" | "gzip" => { + use async_compression::tokio::bufread::GzipDecoder; + use tokio::io::BufReader; + let buf_reader = BufReader::new(reader); + let decoder = GzipDecoder::new(buf_reader); + Ok(Box::new(decoder)) + } + #[cfg(all(feature = "async", feature = "bz"))] + "bz" | "bz2" => { + use async_compression::tokio::bufread::BzDecoder; + use tokio::io::BufReader; + let buf_reader = BufReader::new(reader); + let decoder = BzDecoder::new(buf_reader); + Ok(Box::new(decoder)) + } + #[cfg(all(feature = "async", feature = "zstd"))] + "zst" | "zstd" => { + use async_compression::tokio::bufread::ZstdDecoder; + use tokio::io::BufReader; + let buf_reader = BufReader::new(reader); + let decoder = ZstdDecoder::new(buf_reader); + Ok(Box::new(decoder)) + } + #[cfg(all(feature = "async", feature = "lz"))] + "lz4" | "lz" => Err(OneIoError::NotSupported( + "LZ4 async decompression not yet supported - use spawn_blocking with sync version" + .to_string(), + )), + #[cfg(all(feature = "async", feature = "xz"))] + "xz" | "xz2" => Err(OneIoError::NotSupported( + "XZ async decompression not yet supported - use spawn_blocking with sync version" + .to_string(), + )), + _ => Ok(reader), + } +} diff --git a/src/bin/oneio.rs b/src/bin/oneio.rs index 24628b6..8a58461 100644 --- a/src/bin/oneio.rs +++ b/src/bin/oneio.rs @@ -1,8 +1,21 @@ use clap::{Parser, Subcommand}; -use std::io::Write; -use std::io::{BufRead, BufReader}; +use std::io::{BufRead, BufReader, IsTerminal, Read, Write}; use std::path::PathBuf; use std::process::exit; +use std::time::Duration; + +/// Parse a header string. Accepts "Name: Value" or "Name:Value" (curl-compatible). +fn parse_header(s: &str) -> Result<(String, String), String> { + let (name, value) = s + .split_once(':') + .ok_or_else(|| format!("invalid header format, expected 'Name: Value': {s}"))?; + let name = name.trim().to_string(); + let value = value.trim().to_string(); + if name.is_empty() { + return Err("header name cannot be empty".to_string()); + } + Ok((name, value)) +} #[derive(Parser)] #[clap(author, version)] @@ -10,34 +23,42 @@ use std::process::exit; #[command(arg_required_else_help(true))] /// oneio reads files from local or remote locations with any compression. struct Cli { - /// file to open, remote or local + /// File to open, remote or local #[clap(name = "FILE")] file: Option, - /// download the file to the current directory, similar to run `wget` + /// Download the file to the current directory (similar to wget) #[clap(short, long)] download: bool, - /// output file path + /// Output file path #[clap(short, long)] outfile: Option, - /// cache reading to a specified directory + /// Cache reading to a specified directory #[clap(long)] cache_dir: Option, - /// force re-caching if a local cache already exists + /// Force re-caching if a local cache already exists #[clap(long)] cache_force: bool, - /// specify cache file name + /// Specify cache file name #[clap(long)] cache_file: Option, - /// read through the file and only print out stats + /// Read through the file and only print out stats #[clap(short, long)] stats: bool, + /// Add HTTP header in "Name: Value" format, can be repeated (e.g. -H "Authorization: Bearer TOKEN") + #[clap(short = 'H', long = "header", value_parser = clap::builder::ValueParser::new(parse_header))] + headers: Vec<(String, String)>, + + /// Override compression type (gz, bz2, lz4, xz, zst). Ignored when --download is used. + #[clap(long)] + compression: Option, + #[clap(subcommand)] command: Option, } @@ -52,7 +73,7 @@ enum Commands { /// Generate SHA256 digest Digest { - /// file to open, remote or local + /// File to open, remote or local #[clap(name = "FILE")] file: PathBuf, }, @@ -60,156 +81,267 @@ enum Commands { #[derive(Subcommand)] enum S3Commands { - /// Upload file to S3 + /// Upload a local file to S3 Upload { + /// Local file to upload + #[clap(name = "LOCAL_FILE")] + local_file: PathBuf, + + /// S3 bucket name + bucket: String, + + /// S3 key path + path: String, + }, + + /// Download a file from S3 + Download { /// S3 bucket name - #[clap()] bucket: String, - /// S3 file path - #[clap()] + /// S3 key path path: String, + + /// Local output file path (defaults to the filename from the S3 key) + #[clap(short, long)] + outfile: Option, }, - /// List S3 bucket + + /// List objects in an S3 bucket List { /// S3 bucket name - #[clap()] bucket: String, - /// S3 file path + /// Key prefix to filter results #[clap(default_value = "")] prefix: String, - /// delimiter for directory listing + /// Delimiter for directory-style listing #[clap(short, long)] delimiter: Option, - /// showing directories only + /// Show directories only #[clap(short, long)] dirs: bool, }, } +/// Downloads `path` to `out_path` with a progress bar on stderr. +/// +/// `indicatif::ProgressBar` is `Clone + Send + Sync`, so no Arc needed. +/// The bar is shown immediately; if the total size is unknown it shows a spinner. +fn download_with_progress( + oneio: &oneio::OneIo, + path: &str, + out_path: &str, + message: &str, +) -> Result<(), Box> { + let pb = indicatif::ProgressBar::new(0); + pb.set_draw_target(indicatif::ProgressDrawTarget::stderr()); + pb.set_style( + indicatif::ProgressStyle::default_bar() + .template( + "{spinner:.green} [{elapsed_precise}] [{wide_bar:.cyan/blue}] \ + {bytes}/{total_bytes} ({bytes_per_sec}, {eta})", + )? + .progress_chars("#>-"), + ); + pb.set_message(message.to_string()); + pb.enable_steady_tick(Duration::from_millis(100)); + + let pb_cb = pb.clone(); + let (mut reader, total_size) = + oneio.get_reader_with_progress(path, move |bytes_read, total_bytes| { + if total_bytes > 0 { + pb_cb.set_length(total_bytes); + } + pb_cb.set_position(bytes_read); + })?; + + // Set length upfront if we got it from the content-length probe. + if let Some(size) = total_size { + pb.set_length(size); + } + + let mut writer = std::fs::File::create(out_path)?; + let mut buffer = [0u8; 8192]; + loop { + match reader.read(&mut buffer) { + Ok(0) => break, + Ok(n) => writer.write_all(&buffer[..n])?, + Err(e) => return Err(Box::new(e)), + } + } + pb.finish_with_message(format!("Downloaded to {out_path}")); + Ok(()) +} + +fn build_oneio(headers: &[(String, String)]) -> oneio::OneIo { + let mut builder = oneio::OneIo::builder(); + for (name, value) in headers { + builder = builder.header_str(name, value); + } + builder.build().unwrap_or_else(|e| { + eprintln!("error: failed to create OneIo client: {e}"); + exit(1); + }) +} + +fn s3_credentials_or_exit() { + if let Err(e) = oneio::s3_env_check() { + eprintln!("missing S3 credentials: {e}"); + exit(1); + } +} + fn main() { let cli = Cli::parse(); - let outfile: Option = cli.outfile; + let outfile = cli.outfile; + let use_progress = std::io::stderr().is_terminal(); + + let oneio = build_oneio(&cli.headers); if let Some(command) = cli.command { match command { Commands::S3 { s3_command } => match s3_command { S3Commands::Upload { - bucket: s3_bucket, - path: s3_path, + local_file, + bucket, + path, } => { - if let Err(e) = oneio::s3_env_check() { - eprintln!("missing s3 credentials"); - eprintln!("{e}"); - exit(1); - } - let path_string = cli.file.clone().unwrap().to_str().unwrap().to_string(); - match oneio::s3_upload( - s3_bucket.as_str(), - s3_path.as_str(), - path_string.as_str(), - ) { - Ok(_) => { - println!("file successfully uploaded to s3://{s3_bucket}/{s3_path}"); + s3_credentials_or_exit(); + let local = local_file.to_string_lossy(); + match oneio::s3_upload(&bucket, &path, &local) { + Ok(_) => println!("uploaded to s3://{bucket}/{path}"), + Err(e) => { + eprintln!("upload error: {e}"); + exit(1); } + } + } + + S3Commands::Download { + bucket, + path, + outfile: local_outfile, + } => { + s3_credentials_or_exit(); + let local_path = match local_outfile { + Some(p) => p.to_string_lossy().into_owned(), + None => path + .split('/') + .next_back() + .unwrap_or("downloaded_file") + .to_string(), + }; + let s3_url = format!("s3://{bucket}/{path}"); + let result = if use_progress { + download_with_progress( + &oneio, + &s3_url, + &local_path, + &format!("s3://{bucket}/{path}"), + ) + } else { + oneio::s3_download(&bucket, &path, &local_path) + .map_err(|e| Box::new(e) as Box) + }; + match result { + Ok(_) => println!("downloaded s3://{bucket}/{path} to {local_path}"), Err(e) => { - eprintln!("file upload error: {e}"); + eprintln!("download error: {e}"); + exit(1); } } - return; } + S3Commands::List { bucket, prefix, delimiter, dirs, } => { - if let Err(e) = oneio::s3_env_check() { - eprintln!("missing s3 credentials"); - eprintln!("{e}"); - exit(1); - } - match oneio::s3_list(bucket.as_str(), prefix.as_str(), delimiter, dirs) { - Ok(paths) => { - paths.iter().for_each(|p| println!("{p}")); - } + s3_credentials_or_exit(); + match oneio::s3_list(&bucket, &prefix, delimiter, dirs) { + Ok(paths) => paths.iter().for_each(|p| println!("{p}")), Err(e) => { - eprintln!("unable to list bucket content"); - eprintln!("{e}"); + eprintln!("list error: {e}"); exit(1); } } - return; } }, + Commands::Digest { file } => { - let path_string = file.as_path().to_string_lossy().to_string(); - println!( - "{}", - oneio::get_sha256_digest(path_string.as_str()).unwrap() - ); - return; + let path = file.to_string_lossy(); + match oneio::get_sha256_digest(&path) { + Ok(digest) => println!("{digest}"), + Err(e) => { + eprintln!("digest error: {e}"); + exit(1); + } + } } } + return; } - let path_string = cli.file.clone().unwrap().to_str().unwrap().to_string(); + // Default: read FILE + let path_string = cli.file.as_deref().unwrap().to_string_lossy().into_owned(); let path = path_string.as_str(); if cli.download { let out_path = match outfile { + Some(p) => p.to_string_lossy().into_owned(), None => path .split('/') .next_back() .unwrap_or("output.txt") .to_string(), - Some(p) => p.to_str().unwrap().to_string(), }; - - match oneio::download(path, out_path.as_str(), None) { - Ok(_) => { - println!("file successfully downloaded to {}", out_path.as_str()); - } + let result = if use_progress { + download_with_progress(&oneio, path, &out_path, path) + } else { + oneio + .download(path, &out_path) + .map_err(|e| Box::new(e) as Box) + }; + match result { + Ok(_) => println!("downloaded to {out_path}"), Err(e) => { - eprintln!("file download error: {e}"); + eprintln!("download error: {e}"); + exit(1); } } - return; } - let reader = Box::new(BufReader::new(match cli.cache_dir { - Some(dir) => { - match oneio::get_cache_reader(path, dir.as_str(), cli.cache_file, cli.cache_force) { - Ok(reader) => reader, - Err(e) => { - eprintln!("Cannot open {path}: {e}"); - return; - } - } + // Reader mode: cache > compression override > auto-detect + let reader_result = if let Some(dir) = cli.cache_dir { + oneio.get_cache_reader(path, &dir, cli.cache_file, cli.cache_force) + } else if let Some(compression) = cli.compression { + oneio.get_reader_with_type(path, &compression) + } else { + oneio.get_reader(path) + }; + + let reader = Box::new(BufReader::new(match reader_result { + Ok(r) => r, + Err(e) => { + eprintln!("cannot open {path}: {e}"); + exit(1); } - None => match oneio::get_reader(path) { - Ok(reader) => reader, - Err(e) => { - eprintln!("Cannot open {path}: {e}"); - return; - } - }, })); let mut stdout = std::io::stdout(); - - let mut count_lines = 0; - let mut count_chars = 0; + let mut count_lines = 0usize; + let mut count_chars = 0usize; for line in reader.lines() { let line = match line { Ok(l) => l, Err(e) => { - eprintln!("Cannot read line from {path}: {e}"); + eprintln!("read error on {path}: {e}"); exit(1); } }; diff --git a/src/builder.rs b/src/builder.rs new file mode 100644 index 0000000..9d48333 --- /dev/null +++ b/src/builder.rs @@ -0,0 +1,215 @@ +use crate::OneIoError; +#[cfg(feature = "http")] +use reqwest::blocking::Client; +#[cfg(feature = "http")] +use reqwest::header::{HeaderMap, HeaderName, HeaderValue, CONTENT_LENGTH, USER_AGENT}; +#[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] +use reqwest::Certificate; + +/// Builder for [`OneIo`], modeled after reqwest's client builder API. +pub struct OneIoBuilder { + #[cfg(feature = "http")] + http_client_builder: reqwest::blocking::ClientBuilder, + #[cfg(feature = "http")] + default_headers: HeaderMap, +} + +impl Default for OneIoBuilder { + fn default() -> Self { + Self::new() + } +} + +impl OneIoBuilder { + /// Creates a new [`OneIoBuilder`] with oneio's default HTTP behavior. + pub fn new() -> Self { + #[cfg(feature = "http")] + let mut http_client_builder = Client::builder(); + + #[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] + { + http_client_builder = + http_client_builder.danger_accept_invalid_certs(accept_invalid_certs_from_env()); + + // Load ONEIO_CA_BUNDLE if set + if let Ok(ca_bundle_path) = std::env::var("ONEIO_CA_BUNDLE") { + if let Ok(pem) = std::fs::read(&ca_bundle_path) { + if let Ok(cert) = Certificate::from_pem(&pem) { + http_client_builder = http_client_builder.add_root_certificate(cert); + } + } + } + } + + Self { + #[cfg(feature = "http")] + http_client_builder, + #[cfg(feature = "http")] + default_headers: default_http_headers(), + } + } + + /// Merges a set of default headers into this builder. + #[cfg(feature = "http")] + pub fn default_headers(mut self, headers: HeaderMap) -> Self { + for (name, value) in headers.iter() { + self.default_headers.insert(name.clone(), value.clone()); + } + self + } + + /// Adds or replaces a single default header for every HTTP request. + #[cfg(feature = "http")] + pub fn header(mut self, name: HeaderName, value: HeaderValue) -> Self { + self.default_headers.insert(name, value); + self + } + + /// Convenience method for string-based headers. + /// Panics on invalid header name or value (same convention as reqwest). + #[cfg(feature = "http")] + pub fn header_str(mut self, name: &str, value: &str) -> Self { + let name = HeaderName::from_bytes(name.as_bytes()).expect("invalid header name"); + let value = HeaderValue::from_str(value).expect("invalid header value"); + self.default_headers.insert(name, value); + self + } + + /// Overrides the default `User-Agent` header. + #[cfg(feature = "http")] + pub fn user_agent(mut self, value: HeaderValue) -> Self { + self.default_headers.insert(USER_AGENT, value); + self + } + + /// Adds an additional trusted root certificate for HTTPS requests. + #[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] + pub fn add_root_certificate(mut self, cert: Certificate) -> Self { + self.http_client_builder = self.http_client_builder.add_root_certificate(cert); + self + } + + /// Adds an additional trusted PEM-encoded root certificate. + #[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] + pub fn add_root_certificate_pem(self, pem: &[u8]) -> Result { + let cert = Certificate::from_pem(pem) + .map_err(|e| OneIoError::InvalidCertificate(e.to_string()))?; + Ok(self.add_root_certificate(cert)) + } + + /// Adds an additional trusted DER-encoded root certificate. + #[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] + pub fn add_root_certificate_der(self, der: &[u8]) -> Result { + let cert = Certificate::from_der(der) + .map_err(|e| OneIoError::InvalidCertificate(e.to_string()))?; + Ok(self.add_root_certificate(cert)) + } + + /// Configures whether invalid HTTPS certificates should be accepted. + #[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] + pub fn danger_accept_invalid_certs(mut self, accept_invalid_certs: bool) -> Self { + self.http_client_builder = self + .http_client_builder + .danger_accept_invalid_certs(accept_invalid_certs); + self + } + + /// Escape hatch for configuring the underlying reqwest client builder. + #[cfg(feature = "http")] + pub fn configure_http(mut self, f: F) -> Self + where + F: FnOnce(reqwest::blocking::ClientBuilder) -> reqwest::blocking::ClientBuilder, + { + self.http_client_builder = f(self.http_client_builder); + self + } + + /// Sets a timeout for the entire request. + #[cfg(feature = "http")] + pub fn timeout(mut self, timeout: std::time::Duration) -> Self { + self.http_client_builder = self.http_client_builder.timeout(timeout); + self + } + + /// Sets a timeout for connecting to a host. + #[cfg(feature = "http")] + pub fn connect_timeout(mut self, timeout: std::time::Duration) -> Self { + self.http_client_builder = self.http_client_builder.connect_timeout(timeout); + self + } + + /// Sets a proxy for all HTTP requests. + #[cfg(feature = "http")] + pub fn proxy(mut self, proxy: reqwest::Proxy) -> Self { + self.http_client_builder = self.http_client_builder.proxy(proxy); + self + } + + /// Disables proxy for all HTTP requests. + #[cfg(feature = "http")] + pub fn no_proxy(mut self) -> Self { + self.http_client_builder = self.http_client_builder.no_proxy(); + self + } + + /// Sets the redirect policy. + #[cfg(feature = "http")] + pub fn redirect(mut self, policy: reqwest::redirect::Policy) -> Self { + self.http_client_builder = self.http_client_builder.redirect(policy); + self + } + + /// Builds a reusable [`OneIo`] instance. + pub fn build(self) -> Result { + dotenvy::dotenv().ok(); + + #[cfg(feature = "rustls")] + crate::crypto::ensure_default_provider()?; + + Ok(crate::client::OneIo { + #[cfg(feature = "http")] + http_client: self + .http_client_builder + .default_headers(self.default_headers) + .build()?, + }) + } +} + +#[cfg(feature = "http")] +fn default_http_headers() -> HeaderMap { + let mut headers = HeaderMap::new(); + headers.insert(USER_AGENT, HeaderValue::from_static("oneio")); + headers.insert(CONTENT_LENGTH, HeaderValue::from_static("0")); + #[cfg(feature = "cli")] + headers.insert( + reqwest::header::CACHE_CONTROL, + HeaderValue::from_static("no-cache"), + ); + headers +} + +#[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] +fn accept_invalid_certs_from_env() -> bool { + dotenvy::dotenv().ok(); + matches!( + std::env::var("ONEIO_ACCEPT_INVALID_CERTS") + .unwrap_or_default() + .to_lowercase() + .as_str(), + "true" | "yes" | "y" | "1" + ) +} + +/// Global default client for free-standing functions. +pub(crate) fn default_oneio() -> Result<&'static crate::client::OneIo, OneIoError> { + use std::sync::OnceLock; + static DEFAULT_ONEIO: OnceLock> = OnceLock::new(); + + match DEFAULT_ONEIO.get_or_init(|| OneIoBuilder::new().build().map_err(|e| e.to_string())) { + Ok(oneio) => Ok(oneio), + Err(message) => Err(OneIoError::Network(Box::new(std::io::Error::other( + message.clone(), + )))), + } +} diff --git a/src/client.rs b/src/client.rs new file mode 100644 index 0000000..babcfe9 --- /dev/null +++ b/src/client.rs @@ -0,0 +1,296 @@ +use crate::compression::{get_compression_reader, get_compression_writer}; +#[cfg(any(feature = "http", feature = "ftp"))] +use crate::remote; +#[cfg(feature = "s3")] +use crate::s3; +use crate::OneIoError; +#[cfg(feature = "http")] +use reqwest::blocking::Client; +#[cfg(feature = "json")] +use serde::de::DeserializeOwned; +use std::fs::File; +use std::io::{BufRead, BufReader, BufWriter, Lines, Read, Write}; +use std::path::Path; + +/// Reusable OneIO client for applying request configuration across multiple operations. +/// +/// Use [`OneIo::builder()`] to customize default headers, TLS certificates, and +/// other HTTP options once, then reuse the resulting client across reads and +/// downloads. +#[derive(Clone)] +pub struct OneIo { + #[cfg(feature = "http")] + pub(crate) http_client: Client, +} + +impl OneIo { + /// Creates a new reusable OneIO client with default configuration. + pub fn new() -> Result { + Self::builder().build() + } + + /// Creates a new builder for customizing a reusable [`OneIo`] client. + pub fn builder() -> crate::builder::OneIoBuilder { + crate::builder::OneIoBuilder::new() + } + + /// Wraps an already-constructed reqwest blocking client. + #[cfg(feature = "http")] + pub fn from_client(http_client: Client) -> Self { + Self { http_client } + } + + /// Returns the underlying reqwest blocking client. + #[cfg(feature = "http")] + pub fn http_client(&self) -> &Client { + &self.http_client + } + + /// Creates a raw writer without compression. + pub fn get_writer_raw(&self, path: &str) -> Result, OneIoError> { + crate::get_writer_raw_impl(path) + } + + /// Creates a writer with compression inferred from the path extension. + pub fn get_writer(&self, path: &str) -> Result, OneIoError> { + let output_file = self.get_writer_raw(path)?; + let file_type = crate::file_extension(path); + get_compression_writer(output_file, file_type) + } + + /// Creates a raw reader without decompression. + pub fn get_reader_raw(&self, path: &str) -> Result, OneIoError> { + let raw_reader: Box = match crate::get_protocol(path) { + Some(protocol) => match protocol { + #[cfg(feature = "http")] + "http" | "https" => Box::new(self.get_http_reader_raw(path)?), + #[cfg(feature = "ftp")] + "ftp" => remote::get_ftp_reader_raw(path)?, + #[cfg(feature = "s3")] + "s3" | "r2" => { + let (bucket, path) = s3::s3_url_parse(path)?; + s3::s3_reader(bucket.as_str(), path.as_str())? + } + _ => return Err(OneIoError::NotSupported(path.to_string())), + }, + None => Box::new(File::open(path)?), + }; + Ok(raw_reader) + } + + /// Creates a reader with decompression inferred from the path extension. + pub fn get_reader(&self, path: &str) -> Result, OneIoError> { + let raw_reader = self.get_reader_raw(path)?; + let file_type = crate::file_extension(path); + get_compression_reader(raw_reader, file_type) + } + + /// Creates a reader with explicit compression type override. + /// + /// Useful for URLs with query params or non-standard extensions. + /// Pass empty string for no decompression. + pub fn get_reader_with_type( + &self, + path: &str, + compression: &str, + ) -> Result, OneIoError> { + let raw_reader = self.get_reader_raw(path)?; + get_compression_reader(raw_reader, compression) + } + + /// Creates a reader backed by a local cache file. + pub fn get_cache_reader( + &self, + path: &str, + cache_dir: &str, + cache_file_name: Option, + force_cache: bool, + ) -> Result, OneIoError> { + let dir_path = Path::new(cache_dir); + if !dir_path.is_dir() { + std::fs::create_dir_all(dir_path)?; + } + + let cache_file_name = cache_file_name.unwrap_or_else(|| { + path.split('/') + .next_back() + .unwrap_or("cached_file") + .to_string() + }); + + let cache_file_path = format!("{cache_dir}/{cache_file_name}"); + + if !force_cache && Path::new(cache_file_path.as_str()).exists() { + return self.get_reader(cache_file_path.as_str()); + } + + let mut reader = self.get_reader_raw(path)?; + let mut writer = self.get_writer_raw(cache_file_path.as_str())?; + std::io::copy(&mut reader, &mut writer)?; + writer.flush()?; + + self.get_reader(cache_file_path.as_str()) + } + + /// Checks whether a local or remote path exists. + pub fn exists(&self, path: &str) -> Result { + match crate::get_protocol(path) { + #[cfg(feature = "http")] + Some("http" | "https") => remote::http_file_exists(path, self.http_client()), + #[cfg(feature = "s3")] + Some("s3" | "r2") => { + let (bucket, path) = s3::s3_url_parse(path)?; + s3::s3_exists(bucket.as_str(), path.as_str()) + } + Some(_) => Err(OneIoError::NotSupported(path.to_string())), + None => Ok(Path::new(path).exists()), + } + } + + /// Reads the full contents of a file or URL into a string. + pub fn read_to_string(&self, path: &str) -> Result { + let mut reader = self.get_reader(path)?; + let mut content = String::new(); + reader.read_to_string(&mut content)?; + Ok(content) + } + + /// Reads and deserializes JSON into the requested type. + #[cfg(feature = "json")] + pub fn read_json_struct(&self, path: &str) -> Result { + let reader = self.get_reader(path)?; + let res: T = serde_json::from_reader(reader)?; + Ok(res) + } + + /// Returns an iterator over lines from the provided path. + pub fn read_lines( + &self, + path: &str, + ) -> Result>>, OneIoError> { + let reader = self.get_reader(path)?; + Ok(BufReader::new(reader).lines()) + } + + /// Determines the raw content length for a local or remote path. + pub fn get_content_length(&self, path: &str) -> Result { + match crate::get_protocol(path) { + #[cfg(feature = "http")] + Some(protocol) if protocol == "http" || protocol == "https" => { + remote::get_http_content_length(path, self.http_client()) + } + #[cfg(feature = "ftp")] + Some(protocol) if protocol == "ftp" => Err(OneIoError::NotSupported( + "FTP size determination not yet implemented".to_string(), + )), + #[cfg(feature = "s3")] + Some(protocol) if protocol == "s3" || protocol == "r2" => { + let (bucket, key) = s3::s3_url_parse(path)?; + let stats = s3::s3_stats(&bucket, &key)?; + stats + .content_length + .ok_or_else(|| { + OneIoError::NotSupported( + "S3 object doesn't have content length information".to_string(), + ) + }) + .map(|len| len as u64) + } + Some(_) => Err(OneIoError::NotSupported(format!( + "Protocol not supported for progress tracking: {path}" + ))), + None => Ok(std::fs::metadata(path)?.len()), + } + } + + /// Creates a reader that reports progress while reading raw bytes. + pub fn get_reader_with_progress( + &self, + path: &str, + progress: F, + ) -> Result<(Box, Option), OneIoError> + where + F: Fn(u64, u64) + Send + 'static, + { + let (total_size, size_option) = match self.get_content_length(path) { + Ok(size) => (size, Some(size)), + Err(_) => (0, None), + }; + + let raw_reader = self.get_reader_raw(path)?; + let progress_reader = + crate::progress::ProgressReader::new(raw_reader, total_size, progress); + let file_type = crate::file_extension(path); + let final_reader = get_compression_reader(Box::new(progress_reader), file_type)?; + + Ok((final_reader, size_option)) + } + + /// Returns the blocking HTTP response for a URL. + #[cfg(feature = "http")] + pub fn get_http_reader_raw( + &self, + path: &str, + ) -> Result { + remote::get_http_reader_raw(path, self.http_client()) + } + + /// Returns an HTTP reader with decompression inferred from the URL suffix. + #[cfg(feature = "http")] + pub fn get_http_reader(&self, path: &str) -> Result, OneIoError> { + let raw_reader: Box = Box::new(self.get_http_reader_raw(path)?); + let file_type = crate::file_extension(path); + get_compression_reader(raw_reader, file_type) + } + + /// Downloads a remote resource to a local path without decompression. + pub fn download(&self, remote_path: &str, local_path: &str) -> Result<(), OneIoError> { + let _ = local_path; + + match crate::get_protocol(remote_path) { + #[cfg(feature = "http")] + Some("http" | "https") => { + let mut writer = self.get_writer_raw(local_path)?; + let mut response = self.get_http_reader_raw(remote_path)?; + response.copy_to(&mut writer)?; + Ok(()) + } + #[cfg(feature = "ftp")] + Some("ftp") => { + let mut writer = self.get_writer_raw(local_path)?; + let mut reader = remote::get_ftp_reader_raw(remote_path)?; + std::io::copy(&mut reader, &mut writer)?; + Ok(()) + } + #[cfg(feature = "s3")] + Some("s3" | "r2") => { + let (bucket, path) = s3::s3_url_parse(remote_path)?; + s3::s3_download(bucket.as_str(), path.as_str(), local_path)?; + Ok(()) + } + Some(_) | None => Err(OneIoError::NotSupported(remote_path.to_string())), + } + } + + /// Downloads with retry support and exponential backoff. + pub fn download_with_retry( + &self, + remote_path: &str, + local_path: &str, + retry: usize, + ) -> Result<(), OneIoError> { + let mut attempts = 0; + loop { + match self.download(remote_path, local_path) { + Ok(()) => return Ok(()), + Err(_) if attempts < retry => { + attempts += 1; + std::thread::sleep(std::time::Duration::from_millis( + 100 * (1 << attempts.min(6)), + )); + } + Err(err) => return Err(err), + } + } + } +} diff --git a/src/compression.rs b/src/compression.rs new file mode 100644 index 0000000..c96311b --- /dev/null +++ b/src/compression.rs @@ -0,0 +1,182 @@ +//! Compression algorithms and utilities for OneIO. +//! +//! This module provides a unified interface for reading and writing files with various compression +//! formats, including gzip, bzip2, lz4, xz, and zstd. The available algorithms depend on enabled +//! Cargo features. + +use crate::OneIoError; +use std::fs::File; +use std::io::{BufWriter, Read, Write}; + +/// Returns a compression reader for the given file suffix. +/// +/// This function selects the appropriate compression algorithm based on the provided +/// `file_suffix` (such as `"gz"`, `"bz2"`, `"lz4"`, `"xz"`, or `"zst"`), and returns a +/// reader that transparently decompresses data as it is read. If the suffix is not recognized, +/// the original `raw_reader` is returned unchanged. +pub(crate) fn get_compression_reader( + raw_reader: Box, + file_suffix: &str, +) -> Result, OneIoError> { + match file_suffix { + #[cfg(feature = "any_gz")] + "gz" | "gzip" | "tgz" => gzip::get_reader(raw_reader), + #[cfg(feature = "bz")] + "bz2" | "bz" => bzip2::get_reader(raw_reader), + #[cfg(feature = "lz")] + "lz4" | "lz" => lz4::get_reader(raw_reader), + #[cfg(feature = "xz")] + "xz" | "xz2" | "lzma" => xz::get_reader(raw_reader), + #[cfg(feature = "zstd")] + "zst" | "zstd" => zstd::get_reader(raw_reader), + _ => { + // unknown file type - return the raw bytes reader as is + Ok(raw_reader) + } + } +} + +/// Returns a compression writer for the given file suffix. +/// +/// This function selects the appropriate compression algorithm based on the provided +/// `file_suffix` (such as `"gz"`, `"bz2"`, `"lz4"`, `"xz"`, or `"zst"`), and returns a +/// writer that transparently compresses data as it is written. If the suffix is not recognized, +/// the original `raw_writer` is returned unchanged. +pub(crate) fn get_compression_writer( + raw_writer: BufWriter, + file_suffix: &str, +) -> Result, OneIoError> { + match file_suffix { + #[cfg(feature = "any_gz")] + "gz" | "gzip" | "tgz" => gzip::get_writer(raw_writer), + #[cfg(feature = "bz")] + "bz2" | "bz" => bzip2::get_writer(raw_writer), + #[cfg(feature = "lz")] + "lz4" | "lz" => lz4::get_writer(raw_writer), + #[cfg(feature = "xz")] + "xz" | "xz2" | "lzma" => xz::get_writer(raw_writer), + #[cfg(feature = "zstd")] + "zst" | "zstd" => zstd::get_writer(raw_writer), + _ => Ok(Box::new(raw_writer)), + } +} + +#[cfg(feature = "any_gz")] +pub(crate) mod gzip { + use crate::OneIoError; + use flate2::read::GzDecoder; + use flate2::write::GzEncoder; + use flate2::Compression; + use std::fs::File; + use std::io::{BufWriter, Read, Write}; + + pub(crate) fn get_reader( + raw_reader: Box, + ) -> Result, OneIoError> { + Ok(Box::new(GzDecoder::new(raw_reader))) + } + + pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { + Ok(Box::new(GzEncoder::new(raw_writer, Compression::default()))) + } +} + +#[cfg(feature = "bz")] +pub(crate) mod bzip2 { + use crate::OneIoError; + use std::fs::File; + use std::io::{BufWriter, Read, Write}; + + pub(crate) fn get_reader( + raw_reader: Box, + ) -> Result, OneIoError> { + Ok(Box::new(bzip2::read::BzDecoder::new(raw_reader))) + } + + pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { + Ok(Box::new(bzip2::write::BzEncoder::new( + raw_writer, + bzip2::Compression::default(), + ))) + } +} + +#[cfg(feature = "lz")] +pub(crate) mod lz4 { + use crate::OneIoError; + use std::fs::File; + use std::io::{BufWriter, Read, Write}; + + pub(crate) fn get_reader( + raw_reader: Box, + ) -> Result, OneIoError> { + Ok(Box::new(lz4::Decoder::new(raw_reader)?)) + } + + pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { + let encoder = lz4::EncoderBuilder::new().build(raw_writer)?; + Ok(Box::new(Lz4Writer(Some(encoder)))) + } + + /// Wrapper around `lz4::Encoder` that writes the frame end marker on drop. + /// + /// `lz4::Encoder` has no `Drop` impl — `finish()` must be called explicitly + /// to flush the end-of-stream marker. Without it the compressed stream is + /// incomplete and the decoder returns 0 bytes. + struct Lz4Writer(Option>); + + impl Write for Lz4Writer { + fn write(&mut self, buf: &[u8]) -> std::io::Result { + self.0.as_mut().unwrap().write(buf) + } + fn flush(&mut self) -> std::io::Result<()> { + self.0.as_mut().unwrap().flush() + } + } + + impl Drop for Lz4Writer { + fn drop(&mut self) { + if let Some(encoder) = self.0.take() { + let (mut w, result) = encoder.finish(); + if result.is_ok() { + let _ = w.flush(); + } + } + } + } +} + +#[cfg(feature = "xz")] +pub(crate) mod xz { + use crate::OneIoError; + use std::fs::File; + use std::io::{BufWriter, Read, Write}; + + pub(crate) fn get_reader( + raw_reader: Box, + ) -> Result, OneIoError> { + Ok(Box::new(xz2::read::XzDecoder::new(raw_reader))) + } + + pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { + Ok(Box::new(xz2::write::XzEncoder::new(raw_writer, 6))) + } +} + +#[cfg(feature = "zstd")] +pub(crate) mod zstd { + use crate::OneIoError; + use std::fs::File; + use std::io::{BufWriter, Read, Write}; + + pub(crate) fn get_reader( + raw_reader: Box, + ) -> Result, OneIoError> { + Ok(Box::new(zstd::Decoder::new(raw_reader)?)) + } + + pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { + let encoder = zstd::Encoder::new(raw_writer, 3)?; + Ok(Box::new(encoder.auto_finish())) + } +} diff --git a/src/crypto.rs b/src/crypto.rs new file mode 100644 index 0000000..85dcd1f --- /dev/null +++ b/src/crypto.rs @@ -0,0 +1,84 @@ +//! Crypto provider initialization for rustls. +//! +//! This module provides a helper function to ensure that rustls has a default +//! crypto provider installed. It attempts to use AWS-LC first, falling back +//! to ring if necessary. + +use crate::OneIoError; + +/// Ensures that a default crypto provider is installed for rustls. +/// +/// This function checks if a crypto provider is already installed, and if not, +/// attempts to install one automatically: +/// +/// 1. First tries AWS-LC if available (when rustls is compiled with aws_lc_rs support) +/// 2. Falls back to ring if AWS-LC is not available or installation fails +/// 3. Returns an error if no provider is available +/// +/// This should be called early in your application startup, or before any HTTPS/S3 +/// operations. It's safe to call multiple times - if a provider is already installed, +/// this function does nothing. +#[cfg(feature = "rustls")] +pub fn ensure_default_provider() -> Result<(), OneIoError> { + // Check if a provider is already installed + if rustls_sys::crypto::CryptoProvider::get_default().is_some() { + return Ok(()); + } + + // Try AWS-LC first (if available) + match rustls_sys::crypto::aws_lc_rs::default_provider().install_default() { + Ok(_) => return Ok(()), + Err(_) => { + // If installation failed because a provider is already installed, that's OK + if rustls_sys::crypto::CryptoProvider::get_default().is_some() { + return Ok(()); + } + // AWS-LC installation failed for another reason, try ring + } + } + + // Try ring as fallback + match rustls_sys::crypto::ring::default_provider().install_default() { + Ok(_) => Ok(()), + Err(e) => { + // If installation failed because a provider is already installed, that's OK + if rustls_sys::crypto::CryptoProvider::get_default().is_some() { + return Ok(()); + } + // Both failed and no provider is installed + Err(OneIoError::NotSupported(format!( + "Failed to install rustls crypto provider: {:?}", + e + ))) + } + } +} + +/// Ensures that a default crypto provider is installed for rustls. +/// +/// This is a no-op when rustls feature is not enabled. +#[cfg(not(feature = "rustls"))] +pub fn ensure_default_provider() -> Result<(), OneIoError> { + Ok(()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_ensure_default_provider() { + let result = ensure_default_provider(); + assert!(result.is_ok(), "ensure_default_provider should succeed"); + } + + #[cfg(feature = "rustls")] + #[test] + fn test_provider_installed() { + ensure_default_provider().unwrap(); + assert!( + rustls_sys::crypto::CryptoProvider::get_default().is_some(), + "A crypto provider should be installed" + ); + } +} diff --git a/src/oneio/digest.rs b/src/digest.rs similarity index 63% rename from src/oneio/digest.rs rename to src/digest.rs index b122bfb..d36621a 100644 --- a/src/oneio/digest.rs +++ b/src/digest.rs @@ -2,7 +2,6 @@ //! //! The digest is calculated using the SHA256 algorithm. -use crate::oneio::get_reader_raw; use crate::OneIoError; use ring::digest::{Context, SHA256}; @@ -10,21 +9,14 @@ use ring::digest::{Context, SHA256}; /// /// This function takes a path to a file as input and returns the SHA256 digest of the file /// as a hexadecimal string. -/// -/// # Arguments -/// -/// * `Path` - A string slice that holds the path to the file. -/// -/// # Errors -/// -/// This function can return an error of type `OneIoError` if there is an issue while reading the file. -/// The error can occur if the file doesn't exist, if there are permission issues, or if there are -/// issues with the underlying I/O operations. pub fn get_sha256_digest(path: &str) -> Result { let mut context = Context::new(&SHA256); let mut buffer = [0; 1024]; - let mut reader = get_reader_raw(path)?; + // Open file for reading + let file = std::fs::File::open(path)?; + let mut reader: Box = Box::new(std::io::BufReader::new(file)); + loop { let count = reader.read(&mut buffer)?; if count == 0 { @@ -34,6 +26,5 @@ pub fn get_sha256_digest(path: &str) -> Result { } let digest = context.finish(); - Ok(hex::encode(digest.as_ref())) } diff --git a/src/error.rs b/src/error.rs index cf9d500..fa1610c 100644 --- a/src/error.rs +++ b/src/error.rs @@ -1,7 +1,8 @@ use thiserror::Error; -/// Simplified error enum with only 3 variants +/// Error type for OneIO operations. #[derive(Debug, Error)] +#[non_exhaustive] pub enum OneIoError { /// All IO-related errors (file system, EOF, etc.) #[error("IO error: {0}")] @@ -11,10 +12,25 @@ pub enum OneIoError { #[error("{0}")] Network(Box), + /// Network error with URL context for debugging + #[error("{url}: {source}")] + NetworkWithContext { + source: Box, + url: String, + }, + /// Structured status errors from remote services #[error("{service} status error: {code}")] Status { service: &'static str, code: u16 }, + /// Invalid header name or value + #[error("Invalid header: {0}")] + InvalidHeader(String), + + /// Invalid certificate data + #[error("Invalid certificate: {0}")] + InvalidCertificate(String), + /// Feature not supported/compiled #[error("Not supported: {0}")] NotSupported(String), diff --git a/src/lib.rs b/src/lib.rs index 21733f1..cee608c 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -1,331 +1,312 @@ /*! -OneIO is a Rust library providing unified IO operations for reading and writing compressed -files from local and remote sources with both synchronous and asynchronous support. +Unified I/O for compressed files from any source. -## Quick Start +OneIO provides a single interface for reading and writing files with any +compression format, from local disk or remote locations (HTTP, FTP, S3). -```toml -oneio = "0.20" # Default: gz, bz, https -``` - -## Feature Selection Guide - -### Common Use Cases - -**Local files only:** -```toml -oneio = { version = "0.20", default-features = false, features = ["gz", "bz"] } -``` +# Quick Start -**HTTP only (no HTTPS)**: ```toml -oneio = { version = "0.20", default-features = false, features = ["http", "gz"] } +[dependencies] +oneio = "0.20" ``` -**HTTPS with default rustls**: -```toml -oneio = { version = "0.20", default-features = false, features = ["https", "gz"] } -``` - -**HTTPS with custom TLS backend**: -```toml -# With rustls -oneio = { version = "0.20", default-features = false, features = ["http", "rustls", "gz"] } +```rust,ignore +use oneio; -# With native-tls -oneio = { version = "0.20", default-features = false, features = ["http", "native-tls", "gz"] } +// Read a remote compressed file +let content = oneio::read_to_string("https://example.com/data.txt.gz")?; ``` -**S3-compatible storage**: +# Feature Selection + +Enable only what you need: + +| Feature | Description | +|---------|-------------| +| `gz` | Gzip compression | +| `bz` | Bzip2 compression | +| `lz` | LZ4 compression | +| `xz` | XZ compression | +| `zstd` | Zstandard compression | +| `http` | HTTP/HTTPS support | +| `ftp` | FTP support | +| `s3` | S3-compatible storage | +| `async` | Async I/O support | +| `json` | JSON deserialization | +| `digest` | SHA256 hashing | +| `cli` | Command-line tool | + +**Example: Minimal setup for local files** ```toml -oneio = { version = "0.20", default-features = false, features = ["s3", "https", "gz"] } +[dependencies] +oneio = { version = "0.20", default-features = false, features = ["gz"] } ``` -**Async operations**: +**Example: HTTPS with custom TLS for corporate proxies** ```toml -oneio = { version = "0.20", features = ["async"] } +[dependencies] +oneio = { version = "0.20", default-features = false, features = ["http", "native-tls", "gz"] } ``` -### Available Features - -**Compression** (choose only what you need): -- `gz` - Gzip via flate2 -- `bz` - Bzip2 -- `lz` - LZ4 -- `xz` - XZ -- `zstd` - Zstandard (balanced) - -**Protocols**: -- `http` - HTTP-only support (no TLS) -- `https` - HTTP/HTTPS with rustls TLS backend (equivalent to `http` + `rustls`) -- `ftp` - FTP support (requires `http` + TLS backend) -- `s3` - S3-compatible storage - -**TLS Backends** (for HTTPS - mutually exclusive): -- `rustls` - Pure Rust TLS (use with `http`). Uses both system certificates and bundled Mozilla certificates for maximum compatibility with corporate VPNs and minimal environments. -- `native-tls` - Platform native TLS (use with `http`) - -**Additional**: -- `async` - Async support (limited to gz, bz, zstd for compression) -- `json` - JSON parsing -- `digest` - SHA256 digest calculation -- `cli` - Command-line tool - -Environment: Set `ONEIO_ACCEPT_INVALID_CERTS=true` to accept invalid certificates. - -**Crypto Provider Initialization**: When using rustls features (`https`, `s3`, `ftp`), oneio -automatically initializes the crypto provider (AWS-LC or ring) on first use. You can also -initialize it explicitly at startup using [`crypto::ensure_default_provider()`] for better -control over error handling. +# Core API -## Usages - -### Reading Files - -Read all content into a string: +## Reading ```rust,ignore -use oneio; +// Read entire file to string +let content = oneio::read_to_string("data.txt")?; -const TEST_TEXT: &str = "OneIO test file.\nThis is a test."; +// Read lines +for line in oneio::read_lines("data.txt")? { + println!("{}", line?); +} -// Works with compression and remote files automatically -let content = oneio::read_to_string("https://spaces.bgpkit.org/oneio/test_data.txt.gz")?; -assert_eq!(content.trim(), TEST_TEXT); -# Ok::<(), Box>(()) +// Get a reader for streaming +let mut reader = oneio::get_reader("data.txt.gz")?; ``` -Read line by line: +## Writing ```rust,ignore -use oneio; - -let lines = oneio::read_lines("https://spaces.bgpkit.org/oneio/test_data.txt.gz")? - .map(|line| line.unwrap()) - .collect::>(); +use std::io::Write; -assert_eq!(lines.len(), 2); -assert_eq!(lines[0], "OneIO test file."); -assert_eq!(lines[1], "This is a test."); -# Ok::<(), Box>(()) +let mut writer = oneio::get_writer("output.txt.gz")?; +writer.write_all(b"Hello")?; +// Compression finalized on drop ``` -Get a reader for streaming: +## Reusable Client -```rust -use oneio; -use std::io::Read; +For multiple requests with shared configuration: -let mut reader = oneio::get_reader("tests/test_data.txt.gz")?; -let mut buffer = Vec::new(); -reader.read_to_end(&mut buffer)?; -# Ok::<(), Box>(()) -``` +```rust,ignore +use oneio::OneIo; -### Writing Files +let client = OneIo::builder() + .header_str("Authorization", "Bearer token") + .timeout(std::time::Duration::from_secs(30)) + .build()?; -Write with automatic compression: +let data1 = client.read_to_string("https://api.example.com/1.json")?; +let data2 = client.read_to_string("https://api.example.com/2.json")?; +``` -```rust,ignore -use oneio; -use std::io::Write; +# Compression -let mut writer = oneio::get_writer("output.txt.gz")?; -writer.write_all(b"Hello, compressed world!")?; -drop(writer); // Important: close the writer +Automatic detection by file extension: -// Read it back -let content = oneio::read_to_string("output.txt.gz")?; -assert_eq!(content, "Hello, compressed world!"); -# Ok::<(), Box>(()) -``` +| Extension | Algorithm | +|-----------|-----------| +| `.gz` | Gzip | +| `.bz2` | Bzip2 | +| `.lz4` | LZ4 | +| `.xz` | XZ | +| `.zst` | Zstandard | -### Remote Files with Custom Headers +Override detection for URLs with query parameters: ```rust,ignore -use oneio; +use oneio::OneIo; -let client = oneio::create_client_with_headers([("Authorization", "Bearer TOKEN")])?; -let mut reader = oneio::get_http_reader( - "https://api.example.com/protected/data.json.gz", - Some(client) +let client = OneIo::new()?; +let reader = client.get_reader_with_type( + "https://api.example.com/data?format=gz", + "gz" )?; - -let content = std::io::read_to_string(&mut reader)?; -println!("{}", content); -# Ok::<(), Box>(()) ``` -### Progress Tracking -Track download/read progress with callbacks: +# Protocols -```rust,ignore -use oneio; +- **Local**: `/path/to/file.txt` +- **HTTP/HTTPS**: `https://example.com/file.txt.gz` +- **FTP**: `ftp://ftp.example.com/file.txt` (requires `ftp` feature) +- **S3**: `s3://bucket/key` (requires `s3` feature) -let (mut reader, total_size) = oneio::get_reader_with_progress( - "https://example.com/largefile.gz", - |bytes_read, total_bytes| { - match total_bytes { - Some(total) => { - let percent = (bytes_read as f64 / total as f64) * 100.0; - println!("Progress: {:.1}%", percent); - } - None => println!("Downloaded: {} bytes", bytes_read), - } - } -)?; -# Ok::<(), Box>(()) -``` +# Async API -### Async Support (Feature: `async`) +Enable the `async` feature: ```rust,ignore -use oneio; +let content = oneio::read_to_string_async("https://example.com/data.txt").await?; +``` -#[tokio::main] -async fn main() -> Result<(), Box> { - let content = oneio::read_to_string_async("https://example.com/data.json.gz").await?; +Async compression support: `gz`, `bz`, `zstd` +LZ4 and XZ return `NotSupported` error. - oneio::download_async( - "https://example.com/data.csv.gz", - "local_data.csv.gz" - ).await?; +# Error Handling - // download_async preserves the remote bytes. +```rust,ignore +use oneio::OneIoError; - Ok(()) +match oneio::get_reader("file.txt") { + Ok(reader) => { /* ... */ } + Err(OneIoError::Io(e)) => { /* filesystem error */ } + Err(OneIoError::Network(e)) => { /* network error */ } + Err(OneIoError::NotSupported(msg)) => { /* feature not enabled */ } + _ => { /* future error variants */ } } ``` -Note: Async compression is limited to gz, bz, zstd. LZ4/XZ return `NotSupported`. +# Environment Variables +- `ONEIO_ACCEPT_INVALID_CERTS=true` - Accept invalid TLS certificates (development only) +- `ONEIO_CA_BUNDLE=/path/to/ca.pem` - Add custom CA certificate to trust store -## Supported Formats +# TLS and Corporate Proxies -### Compression Detection +For environments with custom TLS certificates (Cloudflare WARP, corporate proxies): -OneIO detects compression algorithm by the file extensions: +1. Use `native-tls` feature to use the OS trust store: + ```toml + features = ["http", "native-tls"] + ``` -- **Gzip**: `.gz`, `.gzip` -- **Bzip2**: `.bz`, `.bz2` -- **LZ4**: `.lz4`, `.lz` -- **XZ**: `.xz`, `.xz2` -- **Zstandard**: `.zst`, `.zstd` +2. Or add certificates programmatically: + ```rust,ignore + let client = OneIo::builder() + .add_root_certificate_pem(&std::fs::read("ca.pem")?)? + .build()?; + ``` -### Protocol Support -- **Local files**: `/path/to/file.txt` -- **HTTP/HTTPS**: `https://example.com/file.txt.gz` -- **FTP**: `ftp://ftp.example.com/file.txt` (requires `ftp` feature) -- **S3**: `s3://bucket/path/file.txt` (requires `s3` feature) - -## Command Line Tool - -Install the CLI tool: - -```bash -cargo install oneio --features cli -``` +3. Or via environment variable: + ```bash + export ONEIO_CA_BUNDLE=/path/to/ca.pem + ``` +*/ -Basic usage: +#![doc( + html_logo_url = "https://raw.githubusercontent.com/bgpkit/assets/main/logos/icon-transparent.png", + html_favicon_url = "https://raw.githubusercontent.com/bgpkit/assets/main/logos/favicon.ico" +)] -```bash -# Read and print a remote compressed file -oneio https://example.com/data.txt.gz +mod builder; +mod client; +mod compression; +mod error; +mod progress; -# Download a file -oneio -d https://example.com/largefile.bz2 +pub use builder::OneIoBuilder; +pub use client::OneIo; +pub use error::OneIoError; -# Pipe to other tools -oneio https://api.example.com/data.json.gz | jq '.results | length' -``` +#[cfg(feature = "async")] +pub mod async_reader; +#[cfg(feature = "rustls")] +pub mod crypto; +#[cfg(feature = "digest")] +pub mod digest; +#[cfg(any(feature = "http", feature = "ftp"))] +pub(crate) mod remote; +#[cfg(feature = "s3")] +pub mod s3; -## S3 Operations (Feature: `s3`) +// Re-export all s3 functions +#[cfg(feature = "s3")] +pub use s3::*; -```rust,ignore -use oneio::s3::*; +// Re-export all digest functions +#[cfg(feature = "digest")] +pub use digest::*; -// Direct S3 operations -s3_upload("my-bucket", "path/to/file.txt", "local/file.txt")?; -s3_download("my-bucket", "path/to/file.txt", "downloaded.txt")?; +use std::fs::File; +use std::io::{BufWriter, Read, Write}; -// Read S3 directly -let content = oneio::read_to_string("s3://my-bucket/path/to/file.txt")?; +// Internal helpers -// Check existence and get metadata -if s3_exists("my-bucket", "path/to/file.txt")? { - let stats = s3_stats("my-bucket", "path/to/file.txt")?; - println!("Size: {} bytes", stats.content_length.unwrap_or(0)); +/// Extracts the protocol from a given path. +pub(crate) fn get_protocol(path: &str) -> Option<&str> { + path.split_once("://").map(|(protocol, _)| protocol) } -// List objects -let objects = s3_list("my-bucket", "path/", Some("/".to_string()), false)?; -# Ok::<(), Box>(()) -``` - -## Crypto Provider Initialization (Rustls) - -When using HTTPS, S3, or FTP features with rustls, oneio automatically initializes -a crypto provider (AWS-LC or ring) on first use. For more control, you can initialize -it explicitly at startup: - -```rust,ignore -use oneio; - -fn main() -> Result<(), Box> { - // Initialize crypto provider explicitly at startup - oneio::crypto::ensure_default_provider()?; +/// Extract the file extension, ignoring URL query params and fragments. +pub(crate) fn file_extension(path: &str) -> &str { + let path = path.split('?').next().unwrap_or(path); + let path = path.split('#').next().unwrap_or(path); + path.rsplit('.').next().unwrap_or("") +} - // Now all HTTPS/S3/FTP operations will work - let content = oneio::read_to_string("https://example.com/data.txt")?; +/// Creates a raw writer without compression. +pub(crate) fn get_writer_raw_impl(path: &str) -> Result, OneIoError> { + let path = std::path::Path::new(path); + if let Some(prefix) = path.parent() { + std::fs::create_dir_all(prefix)?; + } + let output_file = BufWriter::new(File::create(path)?); + Ok(output_file) +} - Ok(()) +/// Creates a raw reader for local files. +#[allow(dead_code)] +pub(crate) fn get_reader_raw_impl(path: &str) -> Result, OneIoError> { + let file = File::open(path)?; + Ok(Box::new(std::io::BufReader::new(file))) } -``` -This is particularly useful in libraries or applications that want to: -- Handle initialization errors early -- Control when the provider is set up -- Make the dependency on crypto providers explicit +/// Gets a reader for the given file path. +pub fn get_reader(path: &str) -> Result, OneIoError> { + builder::default_oneio()?.get_reader(path) +} -## Error Handling +/// Returns a writer for the given file path with the corresponding compression. +pub fn get_writer(path: &str) -> Result, OneIoError> { + builder::default_oneio()?.get_writer(path) +} -Three error types in v0.20: +/// Checks whether a local or remote path exists. +pub fn exists(path: &str) -> Result { + builder::default_oneio()?.exists(path) +} -```rust -use oneio::OneIoError; +/// Reads the full contents of a file or URL into a string. +pub fn read_to_string(path: &str) -> Result { + builder::default_oneio()?.read_to_string(path) +} -match oneio::get_reader("file.txt") { - Ok(reader) => { /* use reader */ }, - Err(OneIoError::Io(e)) => { /* filesystem error */ }, - Err(OneIoError::Network(e)) => { /* network error */ }, - Err(OneIoError::Status { service, code }) => { /* remote status error */ }, - Err(OneIoError::NotSupported(msg)) => { /* feature not compiled */ }, +/// Reads and deserializes JSON into the requested type. +#[cfg(feature = "json")] +pub fn read_json_struct(path: &str) -> Result { + builder::default_oneio()?.read_json_struct(path) } -``` -*/ -#![doc( - html_logo_url = "https://raw.githubusercontent.com/bgpkit/assets/main/logos/icon-transparent.png", - html_favicon_url = "https://raw.githubusercontent.com/bgpkit/assets/main/logos/favicon.ico" -)] +/// Returns an iterator over lines from the provided path. +pub fn read_lines( + path: &str, +) -> Result>>, OneIoError> { + builder::default_oneio()?.read_lines(path) +} -mod error; -mod oneio; +/// Downloads a remote resource to a local path. +pub fn download(remote: &str, local: &str) -> Result<(), OneIoError> { + builder::default_oneio()?.download(remote, local) +} -pub use error::OneIoError; +/// Creates a reader backed by a local cache file. +pub fn get_cache_reader( + path: &str, + cache_dir: &str, + cache_file_name: Option, + force_cache: bool, +) -> Result, OneIoError> { + builder::default_oneio()?.get_cache_reader(path, cache_dir, cache_file_name, force_cache) +} -#[cfg(feature = "rustls")] -pub mod crypto { - //! Crypto provider initialization for rustls. - pub use crate::oneio::crypto::*; +/// Gets an async reader for the given file path. +#[cfg(feature = "async")] +pub async fn get_reader_async( + path: &str, +) -> Result, OneIoError> { + async_reader::get_reader_async(path).await } -#[cfg(feature = "digest")] -pub use crate::oneio::digest::*; -#[cfg(any(feature = "http", feature = "ftp"))] -pub use crate::oneio::remote::*; -#[cfg(feature = "s3")] -pub use crate::oneio::s3::*; -pub use crate::oneio::utils::*; +/// Reads the entire content of a file asynchronously into a string. +#[cfg(feature = "async")] +pub async fn read_to_string_async(path: &str) -> Result { + async_reader::read_to_string_async(path).await +} -pub use crate::oneio::*; +/// Downloads a file asynchronously from a URL to a local path. +#[cfg(feature = "async")] +pub async fn download_async(url: &str, path: &str) -> Result<(), OneIoError> { + async_reader::download_async(url, path).await +} diff --git a/src/oneio/compressions/bzip2.rs b/src/oneio/compressions/bzip2.rs deleted file mode 100644 index 12a1c70..0000000 --- a/src/oneio/compressions/bzip2.rs +++ /dev/null @@ -1,36 +0,0 @@ -//! Bzip2 compression support for OneIO. -//! -//! This module provides bzip2 compression support for OneIO. - -use crate::OneIoError; -use bzip2::read::BzDecoder; -use bzip2::write::BzEncoder; -use bzip2::Compression; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; - -/// Returns a reader that decompresses bzip2-compressed data from the given reader. -/// -/// # Arguments -/// * `raw_reader` - A boxed reader containing bzip2-compressed data. -/// -/// # Returns -/// * `Ok(Box)` - A reader that decompresses bzip2 data on the fly. -/// * `Err(OneIoError)` - If the bzip2 decoder could not be created. -pub(crate) fn get_reader( - raw_reader: Box, -) -> Result, OneIoError> { - Ok(Box::new(BzDecoder::new(raw_reader))) -} - -/// Returns a writer that compresses data to bzip2 format. -/// -/// # Arguments -/// * `raw_writer` - A buffered writer for the target file. -/// -/// # Returns -/// * `Ok(Box)` - A writer that compresses data to bzip2 format. -/// * `Err(OneIoError)` - If the bzip2 encoder could not be created. -pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { - Ok(Box::new(BzEncoder::new(raw_writer, Compression::default()))) -} diff --git a/src/oneio/compressions/gzip.rs b/src/oneio/compressions/gzip.rs deleted file mode 100644 index 24e6423..0000000 --- a/src/oneio/compressions/gzip.rs +++ /dev/null @@ -1,38 +0,0 @@ -//! Gzip compression support for OneIO. -//! -//! This module provides gzip compression support for OneIO. - -use crate::OneIoError; -use flate2::read::GzDecoder; -use flate2::write::GzEncoder; -use flate2::Compression; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; - -/// Returns a reader that decompresses gzip-compressed data from the given reader. -/// -/// # Arguments -/// * `raw_reader` - A boxed reader containing gzip-compressed data. -/// -/// # Returns -/// * `Ok(Box)` - A reader that decompresses gzip data on the fly. -/// * `Err(OneIoError)` - If the gzip decoder could not be created. -pub(crate) fn get_reader( - raw_reader: Box, -) -> Result, OneIoError> { - Ok(Box::new(GzDecoder::new(raw_reader))) -} - -/// Returns a writer that compresses data to gzip format. -/// -/// # Arguments -/// * `raw_writer` - A buffered writer for the target file. -/// -/// # Returns -/// * `Ok(Box)` - A writer that compresses data to gzip format. -/// * `Err(OneIoError)` - If the gzip encoder could not be created. -pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { - // see libflate docs on the reasons of using [AutoFinishUnchecked]. - let encoder = GzEncoder::new(raw_writer, Compression::default()); - Ok(Box::new(encoder)) -} diff --git a/src/oneio/compressions/lz4.rs b/src/oneio/compressions/lz4.rs deleted file mode 100644 index e8e3968..0000000 --- a/src/oneio/compressions/lz4.rs +++ /dev/null @@ -1,40 +0,0 @@ -//! LZ4 compression support for OneIO. -//! -//! This module provides lz4 decompression support. Writing lz4-compressed files is not currently supported. - -use crate::OneIoError; -use lz4::Decoder; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; - -/// Returns a reader that decompresses lz4-compressed data from the given reader. -/// -/// # Arguments -/// * `raw_reader` - A boxed reader containing lz4-compressed data. -/// -/// # Returns -/// * `Ok(Box)` - A reader that decompresses lz4 data on the fly. -/// * `Err(OneIoError)` - If the lz4 decoder could not be created. -pub(crate) fn get_reader( - raw_reader: Box, -) -> Result, OneIoError> { - Decoder::new(raw_reader) - .map(|decoder| Box::new(decoder) as Box) - .map_err(|e| { - // Preserve original error information in the message - OneIoError::Io(e) - }) -} - -/// Returns an error because lz4 writer is not currently supported. -/// -/// # Arguments -/// * `_raw_writer` - A buffered writer for the target file (unused). -/// -/// # Returns -/// * `Err(OneIoError)` - Always returns an error indicating lz4 writer is not supported. -pub(crate) fn get_writer(_raw_writer: BufWriter) -> Result, OneIoError> { - Err(OneIoError::NotSupported( - "lz4 writer is not currently supported.".to_string(), - )) -} diff --git a/src/oneio/compressions/mod.rs b/src/oneio/compressions/mod.rs deleted file mode 100644 index 95adcaa..0000000 --- a/src/oneio/compressions/mod.rs +++ /dev/null @@ -1,112 +0,0 @@ -//! Compression algorithms and utilities for OneIO. -//! -//! This module provides a unified interface for reading and writing files with various compression -//! formats, including gzip, bzip2, lz4, xz, and zstd. The available algorithms depend on enabled -//! Cargo features. Utility functions are provided to select the appropriate algorithm based on file -//! suffixes. - -use crate::OneIoError; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; - -#[cfg(feature = "bz")] -pub(crate) mod bzip2; -#[cfg(feature = "any_gz")] -pub(crate) mod gzip; -#[cfg(feature = "lz")] -pub(crate) mod lz4; -#[cfg(feature = "xz")] -pub(crate) mod xz; -#[cfg(feature = "zstd")] -pub(crate) mod zstd; - -/// Returns a compression reader for the given file suffix. -/// -/// This function selects the appropriate compression algorithm based on the provided -/// `file_suffix` (such as `"gz"`, `"bz2"`, `"lz4"`, `"xz"`, or `"zst"`), and returns a -/// reader that transparently decompresses data as it is read. If the suffix is not recognized, -/// the original `raw_reader` is returned unchanged. -/// -/// # Arguments -/// -/// * `raw_reader` - A boxed reader implementing `Read + Send`, typically the source file or stream. -/// * `file_suffix` - The file extension or suffix indicating the compression type. -/// -/// # Returns -/// -/// * `Ok(Box)` - A boxed reader that decompresses data on the fly, or the original reader if no compression is detected. -/// * `Err(OneIoError)` - If the compression reader could not be created. -/// -/// # Feature Flags -/// -/// The available compression algorithms depend on enabled Cargo features: -/// - `"gz"` for gzip -/// - `"bz"` for bzip2 -/// - `"lz"` for lz4 -/// - `"xz"` for xz/lzma -/// - `"zstd"` for zstandard -pub(crate) fn get_compression_reader( - raw_reader: Box, - file_suffix: &str, -) -> Result, OneIoError> { - match file_suffix { - #[cfg(feature = "any_gz")] - "gz" | "gzip" | "tgz" => gzip::get_reader(raw_reader), - #[cfg(feature = "bz")] - "bz2" | "bz" => bzip2::get_reader(raw_reader), - #[cfg(feature = "lz")] - "lz4" | "lz" => lz4::get_reader(raw_reader), - #[cfg(feature = "xz")] - "xz" | "xz2" | "lzma" => xz::get_reader(raw_reader), - #[cfg(feature = "zstd")] - "zst" | "zstd" => zstd::get_reader(raw_reader), - _ => { - // unknown file type of file {}. return the raw bytes reader as is - Ok(raw_reader) - } - } -} - -/// Returns a compression writer for the given file suffix. -/// -/// This function selects the appropriate compression algorithm based on the provided -/// `file_suffix` (such as `"gz"`, `"bz2"`, `"lz4"`, `"xz"`, or `"zst"`), and returns a -/// writer that transparently compresses data as it is written. If the suffix is not recognized, -/// the original `raw_writer` is returned unchanged. -/// -/// # Arguments -/// -/// * `raw_writer` - A buffered writer for the target file. -/// * `file_suffix` - The file extension or suffix indicating the compression type. -/// -/// # Returns -/// -/// * `Ok(Box)` - A boxed writer that compresses data on the fly, or the original writer if no compression is detected. -/// * `Err(OneIoError)` - If the compression writer could not be created. -/// -/// # Feature Flags -/// -/// The available compression algorithms depend on enabled Cargo features: -/// - `"gz"` for gzip -/// - `"bz"` for bzip2 -/// - `"lz"` for lz4 -/// - `"xz"` for xz/lzma -/// - `"zstd"` for zstandard -pub(crate) fn get_compression_writer( - raw_writer: BufWriter, - file_suffix: &str, -) -> Result, OneIoError> { - match file_suffix { - #[cfg(feature = "any_gz")] - "gz" | "gzip" | "tgz" => gzip::get_writer(raw_writer), - #[cfg(feature = "bz")] - "bz2" | "bz" => bzip2::get_writer(raw_writer), - #[cfg(feature = "lz")] - "lz4" | "lz" => lz4::get_writer(raw_writer), - #[cfg(feature = "xz")] - "xz" | "xz2" | "lzma" => xz::get_writer(raw_writer), - #[cfg(feature = "zstd")] - "zst" | "zstd" => zstd::get_writer(raw_writer), - _ => Ok(Box::new(raw_writer)), - } -} diff --git a/src/oneio/compressions/xz.rs b/src/oneio/compressions/xz.rs deleted file mode 100644 index 708de7c..0000000 --- a/src/oneio/compressions/xz.rs +++ /dev/null @@ -1,35 +0,0 @@ -//! XZ compression support for OneIO. -//! -//! This module provides XZ compression support for OneIO. - -use crate::OneIoError; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; -use xz2::read::XzDecoder; -use xz2::write::XzEncoder; - -/// Returns a reader that decompresses xz-compressed data from the given reader. -/// -/// # Arguments -/// * `raw_reader` - A boxed reader containing xz-compressed data. -/// -/// # Returns -/// * `Ok(Box)` - A reader that decompresses xz data on the fly. -/// * `Err(OneIoError)` - If the xz decoder could not be created. -pub(crate) fn get_reader( - raw_reader: Box, -) -> Result, OneIoError> { - Ok(Box::new(XzDecoder::new(raw_reader))) -} - -/// Returns a writer that compresses data to xz format. -/// -/// # Arguments -/// * `raw_writer` - A buffered writer for the target file. -/// -/// # Returns -/// * `Ok(Box)` - A writer that compresses data to xz format. -/// * `Err(OneIoError)` - If the xz encoder could not be created. -pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { - Ok(Box::new(XzEncoder::new(raw_writer, 9))) -} diff --git a/src/oneio/compressions/zstd.rs b/src/oneio/compressions/zstd.rs deleted file mode 100644 index f7bf1b7..0000000 --- a/src/oneio/compressions/zstd.rs +++ /dev/null @@ -1,39 +0,0 @@ -//! Zstandard (zstd) compression support for OneIO. -//! -//! This module provides Zstandard (zstd) compression support for OneIO. - -use crate::OneIoError; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; - -/// Returns a reader that decompresses zstd-compressed data from the given reader. -/// -/// # Arguments -/// * `raw_reader` - A boxed reader containing zstd-compressed data. -/// -/// # Returns -/// * `Ok(Box)` - A reader that decompresses zstd data on the fly. -/// * `Err(OneIoError)` - If the zstd decoder could not be created. -pub(crate) fn get_reader( - raw_reader: Box, -) -> Result, OneIoError> { - match zstd::Decoder::new(raw_reader) { - Ok(dec) => Ok(Box::new(dec)), - Err(e) => Err(OneIoError::Io(e)), - } -} - -/// Returns a writer that compresses data to zstd format. -/// -/// # Arguments -/// * `raw_writer` - A buffered writer for the target file. -/// -/// # Returns -/// * `Ok(Box)` - A writer that compresses data to zstd format. -/// * `Err(OneIoError)` - If the zstd encoder could not be created. -pub(crate) fn get_writer(raw_writer: BufWriter) -> Result, OneIoError> { - match zstd::Encoder::new(raw_writer, 9) { - Ok(dec) => Ok(Box::new(dec.auto_finish())), - Err(e) => Err(OneIoError::Io(e)), - } -} diff --git a/src/oneio/crypto.rs b/src/oneio/crypto.rs deleted file mode 100644 index d85e9f6..0000000 --- a/src/oneio/crypto.rs +++ /dev/null @@ -1,124 +0,0 @@ -//! Crypto provider initialization for rustls. -//! -//! This module provides a helper function to ensure that rustls has a default -//! crypto provider installed. It attempts to use AWS-LC first, falling back -//! to ring if necessary. - -use crate::OneIoError; - -/// Ensures that a default crypto provider is installed for rustls. -/// -/// This function checks if a crypto provider is already installed, and if not, -/// attempts to install one automatically: -/// -/// 1. First tries AWS-LC if available (when rustls is compiled with aws_lc_rs support) -/// 2. Falls back to ring if AWS-LC is not available or installation fails -/// 3. Returns an error if no provider is available -/// -/// This should be called early in your application startup, or before any HTTPS/S3 -/// operations. It's safe to call multiple times - if a provider is already installed, -/// this function does nothing. -/// -/// # Errors -/// -/// Returns a [`OneIoError::NotSupported`] if: -/// - No crypto provider is available in the build -/// - Provider installation fails -/// -/// # Examples -/// -/// ```rust -/// use oneio::crypto::ensure_default_provider; -/// -/// // Call this once at startup -/// ensure_default_provider().expect("Failed to initialize crypto provider"); -/// -/// // Now you can safely use HTTPS/S3 operations -/// let content = oneio::read_to_string("https://example.com/data.txt"); -/// ``` -/// -/// For other crates in your workspace that use oneio: -/// ```rust,ignore -/// // In your binary's main.rs or lib.rs -/// fn main() { -/// oneio::crypto::ensure_default_provider() -/// .expect("Failed to initialize crypto provider"); -/// -/// // Rest of your application... -/// } -/// ``` -#[cfg(feature = "rustls")] -pub fn ensure_default_provider() -> Result<(), OneIoError> { - // Check if a provider is already installed - #[cfg(feature = "rustls")] - { - if rustls_sys::crypto::CryptoProvider::get_default().is_some() { - return Ok(()); - } - - // Try AWS-LC first (if available) - match rustls_sys::crypto::aws_lc_rs::default_provider().install_default() { - Ok(_) => return Ok(()), - Err(_) => { - // If installation failed because a provider is already installed, that's OK - if rustls_sys::crypto::CryptoProvider::get_default().is_some() { - return Ok(()); - } - // AWS-LC installation failed for another reason, try ring - } - } - - // Try ring as fallback - match rustls_sys::crypto::ring::default_provider().install_default() { - Ok(_) => Ok(()), - Err(e) => { - // If installation failed because a provider is already installed, that's OK - if rustls_sys::crypto::CryptoProvider::get_default().is_some() { - return Ok(()); - } - // Both failed and no provider is installed - Err(OneIoError::NotSupported(format!( - "Failed to install rustls crypto provider: {:?}", - e - ))) - } - } - } - - #[cfg(not(feature = "rustls"))] - { - // If rustls is not enabled, that's fine - we're not using it - Ok(()) - } -} - -/// Ensures that a default crypto provider is installed for rustls. -/// -/// This is a no-op when rustls feature is not enabled. -#[cfg(not(feature = "rustls"))] -pub fn ensure_default_provider() -> Result<(), OneIoError> { - Ok(()) -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_ensure_default_provider() { - // Should succeed whether provider is installed or not - let result = ensure_default_provider(); - assert!(result.is_ok(), "ensure_default_provider should succeed"); - } - - #[cfg(feature = "rustls")] - #[test] - fn test_provider_installed() { - // After calling ensure_default_provider, a provider should be available - ensure_default_provider().unwrap(); - assert!( - rustls_sys::crypto::CryptoProvider::get_default().is_some(), - "A crypto provider should be installed" - ); - } -} diff --git a/src/oneio/mod.rs b/src/oneio/mod.rs deleted file mode 100644 index 1c6b6b7..0000000 --- a/src/oneio/mod.rs +++ /dev/null @@ -1,751 +0,0 @@ -pub mod compressions; -#[cfg(feature = "rustls")] -pub mod crypto; -#[cfg(feature = "digest")] -pub mod digest; -#[cfg(any(feature = "http", feature = "ftp"))] -pub mod remote; -#[cfg(feature = "s3")] -pub mod s3; - -pub mod utils; - -use crate::OneIoError; - -use crate::oneio::compressions::{get_compression_reader, get_compression_writer}; -use std::fs::File; -use std::io::{BufWriter, Read, Write}; -use std::path::Path; - -#[cfg(feature = "async")] -use futures::StreamExt; - -/// Extracts the protocol from a given path. -pub(crate) fn get_protocol(path: &str) -> Option<&str> { - path.split_once("://").map(|(protocol, _)| protocol) -} - -pub fn get_writer_raw(path: &str) -> Result, OneIoError> { - let path = Path::new(path); - if let Some(prefix) = path.parent() { - std::fs::create_dir_all(prefix)?; - } - let output_file = BufWriter::new(File::create(path)?); - Ok(output_file) -} - -pub fn get_reader_raw(path: &str) -> Result, OneIoError> { - let raw_reader: Box = match get_protocol(path) { - Some(protocol) => match protocol { - #[cfg(feature = "http")] - "http" | "https" => { - let response = remote::get_http_reader_raw(path, None)?; - Box::new(response) - } - #[cfg(feature = "ftp")] - "ftp" => { - let response = remote::get_ftp_reader_raw(path)?; - Box::new(response) - } - #[cfg(feature = "s3")] - "s3" | "r2" => { - let (bucket, path) = s3::s3_url_parse(path)?; - s3::s3_reader(bucket.as_str(), path.as_str())? - } - _ => { - return Err(OneIoError::NotSupported(path.to_string())); - } - }, - None => Box::new(File::open(path)?), - }; - Ok(raw_reader) -} - -/// Gets a reader for the given file path. -/// -/// # Arguments -/// -/// * `path` - The path of the file to read. -/// -/// # Returns -/// -/// A `Result` containing a boxed `Read+Sync` trait object with the file reader, or `OneIoError` if an error occurs. -/// -/// # Examples -/// -/// ```no_run -/// use std::io::Read; -/// use oneio::get_reader; -/// -/// let mut reader = get_reader("file.txt").unwrap(); -/// let mut buffer = Vec::new(); -/// reader.read_to_end(&mut buffer).unwrap(); -/// println!("{}", String::from_utf8_lossy(&buffer)); -/// ``` -pub fn get_reader(path: &str) -> Result, OneIoError> { - // get raw bytes reader - let raw_reader = get_reader_raw(path)?; - - let file_type = path.rsplit('.').next().unwrap_or(""); - get_compression_reader(raw_reader, file_type) -} - -/// get file reader with local cache. -/// -/// parameters: -/// * `path`: file path to open, remote or local -/// * `cache_dir`: path str to cache directory -/// * `cache_file_name`: optional file name for cache file, default to use the same filename as the to-read file -/// * `force_cache`: whether to force refresh cache file if a local cache file already exists -pub fn get_cache_reader( - path: &str, - cache_dir: &str, - cache_file_name: Option, - force_cache: bool, -) -> Result, OneIoError> { - let dir_path = Path::new(cache_dir); - if !dir_path.is_dir() { - match std::fs::create_dir_all(dir_path) { - Ok(_) => {} - Err(e) => return Err(OneIoError::Io(e)), - } - } - - let cache_file_name = cache_file_name.unwrap_or_else(|| { - path.split('/') - .next_back() - .unwrap_or("cached_file") - .to_string() - }); - - let cache_file_path = format!("{cache_dir}/{cache_file_name}"); - - // if cache file already exists - if !force_cache && Path::new(cache_file_path.as_str()).exists() { - return get_reader(cache_file_path.as_str()); - } - - // read all to cache file, no encode/decode happens - let mut reader = get_reader_raw(path)?; - let mut writer = get_writer_raw(cache_file_path.as_str())?; - std::io::copy(&mut reader, &mut writer)?; - writer.flush()?; - - // return reader from cache file - get_reader(cache_file_path.as_str()) -} - -/// Returns a writer for the given file path with the corresponding compression. -/// -/// # Arguments -/// -/// * `path` - A string slice representing the file path. -/// -/// # Returns -/// -/// * `Result, OneIoError>` - A result containing a boxed writer trait object or an error. -/// -/// # Examples -/// -/// ```rust,no_run -/// use std::io::{self, Write}; -/// use oneio::get_writer; -/// -/// let writer = match get_writer("output.txt") { -/// Ok(writer) => writer, -/// Err(error) => panic!("Failed to create writer: {:?}", error), -/// }; -/// ``` -pub fn get_writer(path: &str) -> Result, OneIoError> { - let output_file = get_writer_raw(path)?; - - let file_type = path.rsplit('.').next().unwrap_or(""); - get_compression_writer(output_file, file_type) -} - -/// Check if a file or directory exists. -/// -/// This function takes a path as an argument and returns a `Result` indicating whether the file or directory at the given path exists or not. -/// -/// # Examples -/// -/// ```rust -/// use crate::oneio::exists; -/// -/// match exists("path/to/file.txt") { -/// Ok(true) => println!("File exists."), -/// Ok(false) => println!("File does not exist."), -/// Err(error) => eprintln!("An error occurred: {}", error), -/// } -/// ``` -/// -/// # Errors -/// -/// This function may return a `OneIoError` if there is an error accessing the file system or if the `remote` feature is enabled and there is an error -pub fn exists(path: &str) -> Result { - #[cfg(any(feature = "http", feature = "s3"))] - { - remote::remote_file_exists(path) - } - #[cfg(not(any(feature = "http", feature = "s3")))] - { - Ok(Path::new(path).exists()) - } -} - -/// Progress tracking callback type -pub type ProgressCallback = F; - -/// Progress reader wrapper that tracks bytes read -pub struct ProgressReader { - inner: R, - bytes_read: u64, - total_size: u64, - callback: F, -} - -impl ProgressReader -where - F: Fn(u64, u64) + Send, -{ - fn new(inner: R, total_size: u64, callback: F) -> Self { - Self { - inner, - bytes_read: 0, - total_size, - callback, - } - } -} - -impl Read for ProgressReader -where - F: Fn(u64, u64) + Send, -{ - fn read(&mut self, buf: &mut [u8]) -> std::io::Result { - let bytes_read = self.inner.read(buf)?; - if bytes_read > 0 { - self.bytes_read += bytes_read as u64; - (self.callback)(self.bytes_read, self.total_size); - } - Ok(bytes_read) - } -} - -/// Determines the content length of a file or URL -/// -/// This function attempts to get the total size of the content at the given path. -/// Used internally by progress tracking - returns an error if size cannot be determined. -/// -/// # Arguments -/// * `path` - File path or URL to check -/// -/// # Returns -/// * `Ok(u64)` - Total content size in bytes -/// * `Err(OneIoError::NotSupported)` - If size cannot be determined -/// * `Err(OneIoError::Network)` - If network error occurs -/// * `Err(OneIoError::Io)` - If file system error occurs -pub fn get_content_length(path: &str) -> Result { - match get_protocol(path) { - #[cfg(feature = "http")] - Some(protocol) if protocol == "http" || protocol == "https" => { - remote::get_http_content_length(path) - } - #[cfg(feature = "ftp")] - Some(protocol) if protocol == "ftp" => { - // For FTP, we'll need to implement SIZE command - // For now, return not supported - Err(OneIoError::NotSupported( - "FTP size determination not yet implemented".to_string(), - )) - } - #[cfg(feature = "s3")] - Some(protocol) if protocol == "s3" || protocol == "r2" => { - // S3 HEAD object - let (bucket, key) = s3::s3_url_parse(path)?; - let stats = s3::s3_stats(&bucket, &key)?; - // HeadObjectResult has content_length field - stats - .content_length - .ok_or_else(|| { - OneIoError::NotSupported( - "S3 object doesn't have content length information".to_string(), - ) - }) - .map(|len| len as u64) - } - Some(_) => Err(OneIoError::NotSupported(format!( - "Protocol not supported for progress tracking: {path}" - ))), - None => { - // Local file - let metadata = std::fs::metadata(path)?; - Ok(metadata.len()) - } - } -} - -/// Gets a reader with progress tracking that reports bytes read and total file size -/// -/// This function returns both a reader and the total file size. If the total size cannot -/// be determined (e.g., streaming endpoints without Content-Length), it returns `None` -/// for the size, providing better context about whether the size is genuinely unknown -/// versus a failure to determine it. -/// -/// The progress callback receives (bytes_read, total_bytes) and tracks raw bytes read -/// from the source before any decompression. When total_bytes is 0, it indicates the -/// total size is unknown. -/// -/// # Arguments -/// * `path` - File path or URL to read -/// * `progress` - Callback function called with (bytes_read, total_bytes) -/// -/// # Returns -/// * `Ok((reader, Some(total_size)))` - Reader and total file size in bytes -/// * `Ok((reader, None))` - Reader with unknown total size -/// * `Err(OneIoError::Network)` - If network error occurs -/// * `Err(OneIoError::Io)` - If file system error occurs -/// -/// # Examples -/// -/// ```rust,ignore -/// use oneio; -/// use std::io::Read; -/// -/// let (mut reader, total_size) = oneio::get_reader_with_progress( -/// "https://example.com/file.gz", -/// |bytes_read, total_bytes| { -/// if total_bytes > 0 { -/// let percent = (bytes_read as f64 / total_bytes as f64) * 100.0; -/// println!("Progress: {:.1}% ({}/{})", percent, bytes_read, total_bytes); -/// } else { -/// println!("Downloaded: {} bytes (size unknown)", bytes_read); -/// } -/// } -/// )?; -/// -/// match total_size { -/// Some(size) => println!("File size: {} bytes", size), -/// None => println!("File size: unknown"), -/// } -/// let mut content = String::new(); -/// reader.read_to_string(&mut content)?; -/// # Ok::<(), Box>(()) -/// ``` -pub fn get_reader_with_progress( - path: &str, - progress: F, -) -> Result<(Box, Option), OneIoError> -where - F: Fn(u64, u64) + Send + 'static, -{ - // Try to determine total size, returning None when it cannot be determined - let (total_size, size_option) = match get_content_length(path) { - Ok(size) => (size, Some(size)), - Err(_) => { - // Size cannot be determined (e.g., streaming endpoints, errors) - handle gracefully - // The Option return type clearly indicates when size is unknown - (0, None) - } - }; - - // Get raw reader (before compression) - let raw_reader = get_reader_raw(path)?; - - // Wrap raw reader with progress tracking - let progress_reader = ProgressReader::new(raw_reader, total_size, progress); - - // Apply compression to the progress-wrapped reader - let file_type = path.rsplit('.').next().unwrap_or(""); - - let final_reader = get_compression_reader(Box::new(progress_reader), file_type)?; - - Ok((final_reader, size_option)) -} - -// ================================ -// ASYNC SUPPORT (Phase 3) -// ================================ - -#[cfg(feature = "async")] -use tokio::io::{AsyncRead, AsyncReadExt}; - -/// Gets an async reader for the given file path -/// -/// This is the async version of `get_reader()`. It supports all the same protocols -/// and compression formats as the sync version. -/// -/// # Arguments -/// * `path` - File path or URL to read -/// -/// # Returns -/// * `Ok(impl AsyncRead)` - Async reader that handles decompression automatically -/// * `Err(OneIoError)` - If file cannot be opened or protocol not supported -/// -/// # Examples -/// -/// ```rust,no_run -/// use tokio::io::AsyncReadExt; -/// -/// #[tokio::main] -/// async fn main() -> Result<(), Box> { -/// let mut reader = oneio::get_reader_async("https://example.com/data.gz").await?; -/// -/// let mut buffer = Vec::new(); -/// reader.read_to_end(&mut buffer).await?; -/// -/// println!("Read {} bytes", buffer.len()); -/// Ok(()) -/// } -/// ``` -#[cfg(feature = "async")] -pub async fn get_reader_async(path: &str) -> Result, OneIoError> { - // Get raw async reader - let raw_reader = get_async_reader_raw(path).await?; - - // Apply compression - let file_type = path.rsplit('.').next().unwrap_or(""); - - get_async_compression_reader(raw_reader, file_type) -} - -/// Reads the entire content of a file asynchronously into a string -/// -/// This is the async version of `read_to_string()`. It handles decompression -/// automatically based on file extension. -/// -/// # Arguments -/// * `path` - File path or URL to read -/// -/// # Returns -/// * `Ok(String)` - File content as UTF-8 string -/// * `Err(OneIoError)` - If file cannot be read or content is not valid UTF-8 -/// -/// # Examples -/// -/// ```rust,no_run -/// #[tokio::main] -/// async fn main() -> Result<(), Box> { -/// let content = oneio::read_to_string_async("https://example.com/data.json.gz").await?; -/// println!("Content: {}", content); -/// Ok(()) -/// } -/// ``` -#[cfg(feature = "async")] -pub async fn read_to_string_async(path: &str) -> Result { - let mut reader = get_reader_async(path).await?; - let mut content = String::new(); - reader.read_to_string(&mut content).await?; - Ok(content) -} - -/// Downloads a file asynchronously from a URL to a local path -/// -/// This is the async version of `download()`. It preserves the raw bytes from -/// the source, matching the synchronous `download()` behavior. -/// -/// # Arguments -/// * `url` - Source URL to download from -/// * `path` - Local file path to save to -/// -/// # Returns -/// * `Ok(())` - Download completed successfully -/// * `Err(OneIoError)` - If download fails or file cannot be written -/// -/// # Examples -/// -/// ```rust,no_run -/// #[tokio::main] -/// async fn main() -> Result<(), Box> { -/// oneio::download_async( -/// "https://example.com/data.csv.gz", -/// "local_data.csv.gz" -/// ).await?; -/// println!("Download complete!"); -/// Ok(()) -/// } -/// ``` -#[cfg(feature = "async")] -pub async fn download_async(url: &str, path: &str) -> Result<(), OneIoError> { - use tokio::fs::File; - use tokio::io::{copy, AsyncWriteExt}; - - if let Some(parent) = Path::new(path).parent() { - if !parent.as_os_str().is_empty() { - tokio::fs::create_dir_all(parent).await?; - } - } - - let mut reader = get_async_reader_raw(url).await?; - let mut file = File::create(path).await?; - copy(&mut reader, &mut file).await?; - - file.flush().await?; - Ok(()) -} - -/// Gets a raw async reader for the given path (before compression) -#[cfg(feature = "async")] -async fn get_async_reader_raw(path: &str) -> Result, OneIoError> { - let raw_reader: Box = match get_protocol(path) { - #[cfg(feature = "http")] - Some(protocol) if protocol == "http" || protocol == "https" => { - #[cfg(feature = "rustls")] - crypto::ensure_default_provider()?; - - let response = reqwest::get(path).await?; - let stream = response - .bytes_stream() - .map(|result| result.map_err(std::io::Error::other)); - Box::new(tokio_util::io::StreamReader::new(stream)) - } - #[cfg(feature = "ftp")] - Some(protocol) if protocol == "ftp" => { - // FTP async not supported - use sync version with tokio::task::spawn_blocking - return Err(OneIoError::NotSupported( - "FTP async not supported - use sync get_reader() instead".to_string(), - )); - } - #[cfg(feature = "s3")] - Some(protocol) if protocol == "s3" || protocol == "r2" => { - // S3 async not supported - use sync version with tokio::task::spawn_blocking - return Err(OneIoError::NotSupported( - "S3 async not supported - use sync get_reader() instead".to_string(), - )); - } - Some(_) => { - return Err(OneIoError::NotSupported(format!( - "Async support not available for protocol in path: {path}" - ))); - } - None => { - // Local file - use tokio::fs::File; - let file = File::open(path).await?; - Box::new(file) - } - }; - Ok(raw_reader) -} - -/// Applies async decompression based on file extension -#[cfg(feature = "async")] -fn get_async_compression_reader( - reader: Box, - file_type: &str, -) -> Result, OneIoError> { - match file_type { - #[cfg(all(feature = "async", feature = "any_gz"))] - "gz" | "gzip" => { - use async_compression::tokio::bufread::GzipDecoder; - use tokio::io::BufReader; - let buf_reader = BufReader::new(reader); - let decoder = GzipDecoder::new(buf_reader); - Ok(Box::new(decoder)) - } - #[cfg(all(feature = "async", feature = "bz"))] - "bz" | "bz2" => { - use async_compression::tokio::bufread::BzDecoder; - use tokio::io::BufReader; - let buf_reader = BufReader::new(reader); - let decoder = BzDecoder::new(buf_reader); - Ok(Box::new(decoder)) - } - #[cfg(all(feature = "async", feature = "zstd"))] - "zst" | "zstd" => { - use async_compression::tokio::bufread::ZstdDecoder; - use tokio::io::BufReader; - let buf_reader = BufReader::new(reader); - let decoder = ZstdDecoder::new(buf_reader); - Ok(Box::new(decoder)) - } - #[cfg(all(feature = "async", feature = "lz"))] - "lz4" | "lz" => { - // LZ4 doesn't have async support in async-compression - // Use spawn_blocking for sync decompression - Err(OneIoError::NotSupported( - "LZ4 async decompression not yet supported - use spawn_blocking with sync version" - .to_string(), - )) - } - #[cfg(all(feature = "async", feature = "xz"))] - "xz" | "xz2" => { - // XZ doesn't have async support in async-compression - // Use spawn_blocking for sync decompression - Err(OneIoError::NotSupported( - "XZ async decompression not yet supported - use spawn_blocking with sync version" - .to_string(), - )) - } - _ => { - // No compression - Ok(reader) - } - } -} - -#[cfg(test)] -mod tests { - use super::*; - #[cfg(any(feature = "any_gz", feature = "http"))] - use std::io::Read; - - #[cfg(any(feature = "any_gz", feature = "http", feature = "async"))] - const TEST_TEXT: &str = "OneIO test file.\nThis is a test."; - - #[cfg(feature = "any_gz")] - #[test] - fn test_progress_tracking_local() { - use std::sync::{Arc, Mutex}; - - // Track progress calls - let progress_calls = Arc::new(Mutex::new(Vec::<(u64, u64)>::new())); - let calls_clone = progress_calls.clone(); - - // Test with a local compressed file - let result = - get_reader_with_progress("tests/test_data.txt.gz", move |bytes_read, total_bytes| { - calls_clone.lock().unwrap().push((bytes_read, total_bytes)); - }); - - match result { - Ok((mut reader, total_size)) => { - assert!(total_size.is_some(), "Local file should have known size"); - let size = total_size.unwrap(); - assert!(size > 0, "Total size should be greater than 0"); - - // Read the entire file - let mut content = String::new(); - reader.read_to_string(&mut content).unwrap(); - assert_eq!(content.trim(), TEST_TEXT.trim()); - - // Check that progress was tracked - let calls = progress_calls.lock().unwrap(); - assert!( - !calls.is_empty(), - "Progress callback should have been called" - ); - - // Verify progress calls are reasonable - let (last_bytes, last_total) = calls.last().unwrap(); - assert_eq!(*last_total, size, "Total should match in callbacks"); - assert!(*last_bytes <= size, "Bytes read should not exceed total"); - assert!(*last_bytes > 0, "Should have read some bytes"); - } - Err(e) => { - println!("Progress tracking test skipped: {:?}", e); - // This can fail if gz feature is not enabled or file doesn't exist - } - } - } - - #[cfg(feature = "http")] - #[test] - fn test_progress_tracking_remote() { - use std::sync::{Arc, Mutex}; - - // Track progress calls - let progress_calls = Arc::new(Mutex::new(Vec::<(u64, u64)>::new())); - let calls_clone = progress_calls.clone(); - - // Test with a remote file that has Content-Length - let result = get_reader_with_progress( - "https://spaces.bgpkit.org/oneio/test_data.txt", - move |bytes_read, total_bytes| { - calls_clone.lock().unwrap().push((bytes_read, total_bytes)); - }, - ); - - match result { - Ok((mut reader, total_size)) => { - // Read the file - let mut content = String::new(); - reader.read_to_string(&mut content).unwrap(); - assert_eq!(content.trim(), TEST_TEXT.trim()); - - // Check progress tracking - let calls = progress_calls.lock().unwrap(); - assert!( - !calls.is_empty(), - "Progress callback should have been called" - ); - - let (last_bytes, last_total) = calls.last().unwrap(); - - match total_size { - Some(size) => { - assert_eq!(*last_total, size, "Total should match in callbacks"); - // Known size: verify bytes read doesn't exceed total - assert!(*last_bytes <= size); - println!( - "Progress tracking succeeded with known size: {} bytes", - size - ); - } - None => { - assert_eq!(*last_total, 0, "Callback should get 0 for unknown size"); - // Unknown size: just verify we read some bytes - assert!(*last_bytes > 0, "Should have read some bytes"); - println!( - "Progress tracking succeeded with unknown size: {} bytes read", - last_bytes - ); - } - } - } - Err(e) => println!("Progress tracking remote test skipped: {:?}", e), - } - } - - #[test] - fn test_get_content_length_local() { - // Test local file content length - match get_content_length("tests/test_data.txt.gz") { - Ok(size) => { - assert!(size > 0, "Local file should have a size greater than 0"); - - // Verify it matches filesystem metadata - let metadata = std::fs::metadata("tests/test_data.txt.gz").unwrap(); - assert_eq!( - size, - metadata.len(), - "Content length should match file metadata" - ); - } - Err(e) => { - println!("Content length test skipped: {:?}", e); - // This can fail if the test file doesn't exist or gz feature is disabled - } - } - } - - // Async tests - #[cfg(feature = "async")] - #[tokio::test] - async fn test_async_reader_local() { - use tokio::io::AsyncReadExt; - - // Test basic async reading - match get_reader_async("tests/test_data.txt").await { - Ok(mut reader) => { - let mut content = String::new(); - reader.read_to_string(&mut content).await.unwrap(); - assert_eq!(content.trim(), TEST_TEXT.trim()); - } - Err(e) => println!("Async test skipped: {:?}", e), - } - - // Test with compression formats that support async - #[cfg(feature = "any_gz")] - { - match get_reader_async("tests/test_data.txt.gz").await { - Ok(mut reader) => { - let mut content = String::new(); - reader.read_to_string(&mut content).await.unwrap(); - assert_eq!(content.trim(), TEST_TEXT.trim()); - } - Err(e) => println!("Async gzip test skipped: {:?}", e), - } - } - } -} diff --git a/src/oneio/remote.rs b/src/oneio/remote.rs deleted file mode 100644 index a6403ca..0000000 --- a/src/oneio/remote.rs +++ /dev/null @@ -1,351 +0,0 @@ -//! This module provides functionality to handle remote file operations such as downloading files -//! from HTTP, FTP, and S3 protocols. -use crate::oneio::{get_protocol, get_writer_raw}; -use crate::OneIoError; -#[cfg(feature = "http")] -use reqwest::blocking::Client; -use std::io::Read; -#[cfg(feature = "http")] -use std::sync::OnceLock; - -#[cfg(feature = "http")] -static DEFAULT_HTTP_CLIENT: OnceLock> = OnceLock::new(); - -#[cfg(feature = "ftp")] -pub(crate) fn get_ftp_reader_raw(path: &str) -> Result, OneIoError> { - if !path.starts_with("ftp://") { - return Err(OneIoError::NotSupported(path.to_string())); - } - - #[cfg(feature = "rustls")] - super::crypto::ensure_default_provider()?; - - let path_without_scheme = path - .strip_prefix("ftp://") - .ok_or_else(|| OneIoError::NotSupported(path.to_string()))?; - let (host, remote_path) = path_without_scheme - .split_once('/') - .ok_or_else(|| OneIoError::NotSupported(path.to_string()))?; - let socket = match host.contains(':') { - true => host.to_string(), - false => format!("{host}:21"), - }; - - let mut ftp_stream = suppaftp::FtpStream::connect(socket)?; - // use anonymous login - ftp_stream.login("anonymous", "oneio")?; - ftp_stream.transfer_type(suppaftp::types::FileType::Binary)?; - let reader = Box::new(ftp_stream.retr_as_stream(remote_path)?); - Ok(reader) -} - -#[cfg(feature = "http")] -fn build_default_http_client() -> Result { - let mut headers = reqwest::header::HeaderMap::new(); - headers.insert( - reqwest::header::USER_AGENT, - reqwest::header::HeaderValue::from_static("oneio"), - ); - headers.insert( - reqwest::header::CONTENT_LENGTH, - reqwest::header::HeaderValue::from_static("0"), - ); - #[cfg(feature = "cli")] - headers.insert( - reqwest::header::CACHE_CONTROL, - reqwest::header::HeaderValue::from_static("no-cache"), - ); - - #[cfg(any(feature = "rustls", feature = "native-tls"))] - { - let accept_invalid_certs = matches!( - std::env::var("ONEIO_ACCEPT_INVALID_CERTS") - .unwrap_or_default() - .to_lowercase() - .as_str(), - "true" | "yes" | "y" | "1" - ); - Client::builder() - .default_headers(headers) - .danger_accept_invalid_certs(accept_invalid_certs) - .build() - } - - #[cfg(not(any(feature = "rustls", feature = "native-tls")))] - { - Client::builder().default_headers(headers).build() - } -} - -#[cfg(feature = "http")] -fn default_http_client() -> Result { - dotenvy::dotenv().ok(); - - #[cfg(feature = "rustls")] - super::crypto::ensure_default_provider()?; - - match DEFAULT_HTTP_CLIENT.get_or_init(|| build_default_http_client().map_err(|e| e.to_string())) - { - Ok(client) => Ok(client.clone()), - Err(message) => Err(OneIoError::Network(Box::new(std::io::Error::other( - message.clone(), - )))), - } -} - -#[cfg(feature = "http")] -pub(crate) fn get_http_reader_raw( - path: &str, - opt_client: Option, -) -> Result { - let client = match opt_client { - Some(c) => c, - None => default_http_client()?, - }; - let res = client - .execute(client.get(path).build()?)? - .error_for_status()?; - Ok(res) -} - -/// Creates a reqwest blocking client with custom headers. -/// -/// # Arguments -/// -/// * `headers_map` - A argument of header key-value pairs. -/// -/// # Returns -/// -/// Returns a Result containing the constructed Client or a [OneIoError]. -/// -/// # Example -/// -/// Example usage with custom header fields: -/// ```no_run -/// use std::collections::HashMap; -/// use reqwest::header::HeaderMap; -/// -/// let client = oneio::create_client_with_headers([("X-Custom-Auth-Key", "TOKEN")]).unwrap(); -/// let mut reader = oneio::get_http_reader( -/// "https://SOME_REMOTE_RESOURCE_PROTECTED_BY_ACCESS_TOKEN", -/// Some(client), -/// ).unwrap(); -/// let mut text = "".to_string(); -/// reader.read_to_string(&mut text).unwrap(); -/// println!("{}", text); -/// ``` -#[cfg(feature = "http")] -pub fn create_client_with_headers(headers: I) -> Result -where - I: IntoIterator, - K: Into, - V: Into, -{ - #[cfg(feature = "rustls")] - super::crypto::ensure_default_provider()?; - - use reqwest::header::{HeaderMap, HeaderName, HeaderValue}; - let mut header_map = HeaderMap::new(); - for (k, v) in headers { - if let (Ok(name), Ok(value)) = ( - HeaderName::from_bytes(k.into().as_bytes()), - HeaderValue::from_str(&v.into()), - ) { - header_map.insert(name, value); - } - } - Ok(Client::builder().default_headers(header_map).build()?) -} - -/// Get a reader for remote content with the capability to specify headers, and customer reqwest options. -/// -/// See [`create_client_with_headers`] for more details on how to create a client with custom headers. -/// -/// Example with customer builder that allows invalid certificates (bad practice): -/// ```no_run -/// use std::collections::HashMap; -/// let client = reqwest::blocking::ClientBuilder::new().danger_accept_invalid_certs(true).build().unwrap(); -/// let mut reader = oneio::get_http_reader( -/// "https://example.com", -/// Some(client) -/// ).unwrap(); -/// let mut text = "".to_string(); -/// reader.read_to_string(&mut text).unwrap(); -/// println!("{}", text); -/// ``` -#[cfg(feature = "http")] -pub fn get_http_reader( - path: &str, - opt_client: Option, -) -> Result, OneIoError> { - use crate::oneio::compressions::get_compression_reader; - - let raw_reader: Box = Box::new(get_http_reader_raw(path, opt_client)?); - let file_type = path.rsplit('.').next().unwrap_or(""); - get_compression_reader(raw_reader, file_type) -} - -#[cfg(feature = "http")] -pub(crate) fn get_http_content_length(path: &str) -> Result { - let client = default_http_client()?; - let response = client.head(path).send()?.error_for_status()?; - - response - .headers() - .get("content-length") - .and_then(|v| v.to_str().ok()) - .and_then(|s| s.parse().ok()) - .ok_or_else(|| { - OneIoError::NotSupported( - "Cannot determine file size - server doesn't provide Content-Length".to_string(), - ) - }) -} - -/// Downloads a file from a remote location to a local path. -/// -/// # Arguments -/// -/// * `remote_path` - The remote path of the file to download. -/// * `local_path` - The local path where the downloaded file will be saved. -/// * `opt_client` - Optional custom [Client] to use for the request. -/// -/// # Errors -/// -/// Returns an `Err` variant of `OneIoError` if any of the following occur: -/// -/// * The protocol of the remote path is not supported. -/// * An error occurs while downloading the file. -/// -/// # Example -/// -/// ```rust,no_run -/// use std::collections::HashMap; -/// use crate::oneio::{download, OneIoError}; -/// -/// fn main() -> Result<(), OneIoError> { -/// let remote_path = "https://example.com/file.txt"; -/// let local_path = "path/to/save/file.txt"; -/// download(remote_path, local_path, None)?; -/// -/// Ok(()) -/// } -/// ``` -pub fn download( - remote_path: &str, - local_path: &str, - // FIXME: the Client is only useful for `http` feature, but `ftp` feature has to depend on it too - opt_client: Option, -) -> Result<(), OneIoError> { - match get_protocol(remote_path) { - #[cfg(feature = "http")] - Some("http" | "https") => { - let mut writer = get_writer_raw(local_path)?; - let mut response = get_http_reader_raw(remote_path, opt_client)?; - response.copy_to(&mut writer)?; - Ok(()) - } - #[cfg(feature = "ftp")] - Some("ftp") => { - let mut writer = get_writer_raw(local_path)?; - let mut reader = get_ftp_reader_raw(remote_path)?; - std::io::copy(&mut reader, &mut writer)?; - Ok(()) - } - #[cfg(feature = "s3")] - Some("s3" | "r2") => { - let (bucket, path) = crate::oneio::s3::s3_url_parse(remote_path)?; - crate::oneio::s3::s3_download(bucket.as_str(), path.as_str(), local_path)?; - Ok(()) - } - Some(_) | None => Err(OneIoError::NotSupported(remote_path.to_string())), - } -} - -/// Downloads a file from a remote path and saves it locally with retry mechanism. -/// -/// # Arguments -/// -/// * `remote_path` - The URL or file path of the file to download. -/// * `local_path` - The file path to save the downloaded file. -/// * `opt_client` - Optional custom [Client] to use for the request. -/// * `retry` - The number of times to retry downloading in case of failure. -/// -/// # Errors -/// -/// Returns an `Err` variant if downloading fails after all retries, otherwise `Ok(())` indicating success. -/// -/// # Examples -/// -/// ```rust,no_run -/// use oneio::download_with_retry; -/// -/// let remote_path = "https://example.com/file.txt"; -/// let local_path = "/path/to/save/file.txt"; -/// let retry = 3; -/// -/// match download_with_retry(remote_path, local_path, retry, None) { -/// Ok(_) => println!("File downloaded successfully"), -/// Err(e) => eprintln!("Error downloading file: {:?}", e), -/// } -/// ``` -pub fn download_with_retry( - remote_path: &str, - local_path: &str, - retry: usize, - opt_client: Option, -) -> Result<(), OneIoError> { - let mut retry = retry; - loop { - match download(remote_path, local_path, opt_client.clone()) { - Ok(_) => { - return Ok(()); - } - Err(e) => { - if retry > 0 { - retry -= 1; - continue; - } else { - return Err(e); - } - } - } - } -} - -/// Check if a remote or local file exists. -/// -/// # Arguments -/// -/// * `path` - The path of the file to check. -/// -/// # Returns -/// -/// Returns a `Result` containing a `bool` indicating whether the file exists or not. If the path is not supported, -/// an `Err` variant with a `OneIoError::NotSupported` error is returned. If there is an error during the file check, -/// an `Err` variant with a `OneIoError` is returned. -pub(crate) fn remote_file_exists(path: &str) -> Result { - match get_protocol(path) { - Some(protocol) => match protocol { - "http" | "https" => { - let client = default_http_client()?; - let res = client - .head(path) - .timeout(std::time::Duration::from_secs(2)) - .send()?; - Ok(res.status().is_success()) - } - #[cfg(feature = "s3")] - "s3" | "r2" => { - let (bucket, path) = crate::oneio::s3::s3_url_parse(path)?; - let res = crate::oneio::s3::s3_exists(bucket.as_str(), path.as_str())?; - Ok(res) - } - _ => Err(OneIoError::NotSupported(path.to_string())), - }, - None => { - // check if local file exists - Ok(std::path::Path::new(path).exists()) - } - } -} diff --git a/src/oneio/utils.rs b/src/oneio/utils.rs deleted file mode 100644 index a9fd856..0000000 --- a/src/oneio/utils.rs +++ /dev/null @@ -1,121 +0,0 @@ -//! Utility functions for file reading and deserialization in OneIO. -//! -//! This module provides helper functions to read file contents as strings, -//! deserialize JSON files into Rust structs, and iterate over lines in files. -//! These utilities abstract over different file sources and formats, -//! simplifying common I/O operations for users of the OneIO crate. - -use crate::{get_reader, OneIoError}; -use std::io::{BufRead, BufReader, Lines, Read}; - -/// Reads the contents of a file to a string. -/// -/// # Arguments -/// -/// * `path` - A string slice that represents the path to the file. -/// -/// # Returns -/// -/// * `Result` - A `Result` where the `Ok` variant contains -/// the contents of the file as a string if the file was successfully read, or -/// the `Err` variant contains a `OneIoError` if an I/O error occurred. -/// -/// # Examples -/// -/// ```rust,no_run -/// use std::fs::File; -/// use oneio::read_to_string; -/// -/// let path = "path/to/file.txt"; -/// let result = read_to_string(path); -/// match result { -/// Ok(content) => println!("File content: {}", content), -/// Err(error) => eprintln!("Error: {}", error), -/// } -/// ``` -pub fn read_to_string(path: &str) -> Result { - let mut reader = get_reader(path)?; - let mut content = String::new(); - reader.read_to_string(&mut content)?; - Ok(content) -} - -/// Reads a JSON file and deserializes it into the specified struct. -/// -/// # Arguments -/// -/// * `path` - A string slice representing the path to the JSON file. -/// -/// # Generic Parameters -/// -/// * `T` - The type of struct to deserialize the JSON into. It must implement the DeserializeOwned trait from the serde crate. -/// -/// # Returns -/// -/// Returns a Result containing the deserialized struct if successful, or an OneIoError if there was an error reading the file or deserializing the JSON -#[cfg(feature = "json")] -pub fn read_json_struct(path: &str) -> Result { - let reader = get_reader(path)?; - let res: T = serde_json::from_reader(reader)?; - Ok(res) -} - -/// Reads lines from a file specified by the given path. -/// -/// # Arguments -/// -/// * `path` - A string slice that represents the path of the file to read. -/// -/// # Returns -/// -/// A `Result` containing a `Lines` iterator of `String` lines or a `OneIoError` indicating the error. -/// -/// # Example -/// -/// ```rust,no_run -/// use std::io::BufRead; -/// use std::io::BufReader; -/// const TEST_TEXT: &str = "OneIO test file. -/// This is a test."; -/// -/// let lines = oneio::read_lines("https://spaces.bgpkit.org/oneio/test_data.txt.gz").unwrap() -/// .map(|line| line.unwrap()).collect::>(); -/// -/// assert_eq!(lines.len(), 2); -/// assert_eq!(lines[0].as_str(), "OneIO test file."); -/// assert_eq!(lines[1].as_str(), "This is a test."); -/// ``` -pub fn read_lines(path: &str) -> Result>>, OneIoError> { - let reader = get_reader(path)?; - let buf_reader = BufReader::new(reader); - Ok(buf_reader.lines()) -} - -#[cfg(test)] -mod tests { - - #[cfg(feature = "json")] - #[test] - fn test_read_json_struct() { - #[derive(serde::Deserialize, Debug)] - struct Data { - purpose: String, - version: u32, - meta: SubData, - } - #[derive(serde::Deserialize, Debug)] - struct SubData { - float: f64, - success: bool, - } - - let data = - crate::read_json_struct::("https://spaces.bgpkit.org/oneio/test_data.json") - .unwrap(); - - assert_eq!(data.purpose, "test".to_string()); - assert_eq!(data.version, 1); - assert_eq!(data.meta.float, 1.1); - assert!(data.meta.success); - } -} diff --git a/src/progress.rs b/src/progress.rs new file mode 100644 index 0000000..256d0b2 --- /dev/null +++ b/src/progress.rs @@ -0,0 +1,39 @@ +//! Progress tracking reader for OneIO. + +use std::io::Read; + +/// Progress reader wrapper that tracks bytes read +pub(crate) struct ProgressReader { + inner: R, + bytes_read: u64, + total_size: u64, + callback: F, +} + +impl ProgressReader +where + F: Fn(u64, u64) + Send, +{ + pub(crate) fn new(inner: R, total_size: u64, callback: F) -> Self { + Self { + inner, + bytes_read: 0, + total_size, + callback, + } + } +} + +impl Read for ProgressReader +where + F: Fn(u64, u64) + Send, +{ + fn read(&mut self, buf: &mut [u8]) -> std::io::Result { + let bytes_read = self.inner.read(buf)?; + if bytes_read > 0 { + self.bytes_read += bytes_read as u64; + (self.callback)(self.bytes_read, self.total_size); + } + Ok(bytes_read) + } +} diff --git a/src/remote.rs b/src/remote.rs new file mode 100644 index 0000000..a6ffa98 --- /dev/null +++ b/src/remote.rs @@ -0,0 +1,109 @@ +//! This module provides functionality to handle remote file operations such as downloading files +//! from HTTP, FTP, and S3 protocols. +use crate::client::OneIo; +use crate::OneIoError; +#[cfg(feature = "http")] +use reqwest::blocking::Client; +#[cfg(feature = "ftp")] +use std::io::Read; + +#[cfg(feature = "ftp")] +pub(crate) fn get_ftp_reader_raw(path: &str) -> Result, OneIoError> { + if !path.starts_with("ftp://") { + return Err(OneIoError::NotSupported(path.to_string())); + } + + #[cfg(feature = "rustls")] + crate::crypto::ensure_default_provider()?; + + let path_without_scheme = path + .strip_prefix("ftp://") + .ok_or_else(|| OneIoError::NotSupported(path.to_string()))?; + let (host, remote_path) = path_without_scheme + .split_once('/') + .ok_or_else(|| OneIoError::NotSupported(path.to_string()))?; + let socket = match host.contains(':') { + true => host.to_string(), + false => format!("{host}:21"), + }; + + let mut ftp_stream = suppaftp::FtpStream::connect(socket)?; + // use anonymous login + ftp_stream.login("anonymous", "oneio")?; + ftp_stream.transfer_type(suppaftp::types::FileType::Binary)?; + let reader = Box::new(ftp_stream.retr_as_stream(remote_path)?); + Ok(reader) +} + +#[cfg(feature = "http")] +pub(crate) fn get_http_reader_raw( + path: &str, + client: &Client, +) -> Result { + let res = client + .execute(client.get(path).build()?)? + .error_for_status() + .map_err(|e| OneIoError::NetworkWithContext { + source: Box::new(e), + url: path.to_string(), + })?; + Ok(res) +} + +/// Creates a reqwest blocking client with custom headers. +/// +/// Prefer [`OneIo::builder()`] for reusable configuration. This helper is +/// **deprecated** and will be removed in a future release. Use the builder instead. +#[deprecated( + since = "0.21.0", + note = "Use OneIo::builder().header_str(k, v).build()?.http_client().clone() instead" +)] +#[allow(dead_code)] +#[cfg(feature = "http")] +pub fn create_client_with_headers(headers: I) -> Result +where + I: IntoIterator, + K: Into, + V: Into, +{ + let mut builder = OneIo::builder(); + for (k, v) in headers { + builder = builder.header_str(k.into().as_str(), v.into().as_str()); + } + Ok(builder.build()?.http_client().clone()) +} + +#[cfg(feature = "http")] +pub(crate) fn get_http_content_length(path: &str, client: &Client) -> Result { + let response = client.head(path).send()?.error_for_status()?; + + response + .headers() + .get("content-length") + .and_then(|v| v.to_str().ok()) + .and_then(|s| s.parse().ok()) + .ok_or_else(|| { + OneIoError::NotSupported( + "Cannot determine file size - server doesn't provide Content-Length".to_string(), + ) + }) +} + +/// Check if a remote or local file exists. +/// +/// # Arguments +/// +/// * `path` - The path of the file to check. +/// +/// # Returns +/// +/// Returns a `Result` containing a `bool` indicating whether the file exists or not. If the path is not supported, +/// an `Err` variant with a `OneIoError::NotSupported` error is returned. If there is an error during the file check, +/// an `Err` variant with a `OneIoError` is returned. +pub(crate) fn http_file_exists(path: &str, client: &Client) -> Result { + let res = client + .head(path) + .timeout(std::time::Duration::from_secs(2)) + .send()?; + Ok(res.status().is_success()) +} diff --git a/src/oneio/s3.rs b/src/s3.rs similarity index 51% rename from src/oneio/s3.rs rename to src/s3.rs index 3030df6..3bab4e1 100644 --- a/src/oneio/s3.rs +++ b/src/s3.rs @@ -5,7 +5,7 @@ //! - AWS_SECRET_ACCESS_KEY //! - AWS_REGION (e.g. "us-east-1") (use "auto" for Cloudflare R2) //! - AWS_ENDPOINT -use crate::oneio::{get_reader_raw, get_writer_raw}; +use crate::get_writer_raw_impl; use crate::OneIoError; use s3::creds::Credentials; use s3::serde_types::{HeadObjectResult, ListBucketResult}; @@ -14,30 +14,6 @@ use std::io::{Cursor, Read, Write}; use std::sync::mpsc::{sync_channel, Receiver, SyncSender}; /// Checks if the necessary environment variables for AWS S3 are set. -/// -/// The required credentials are -/// - `AWS_ACCESS_KEY_ID`: This is the access key for your AWS account. -/// - `AWS_SECRET_ACCESS_KEY`: This is the secret key associated with the `AWS_ACCESS_KEY_ID`. -/// - `AWS_REGION`: The AWS region where the resources are hosted. For example, `us-east-1`. Use `auto` for Cloudflare R2. -/// - `AWS_ENDPOINT`: The specific endpoint of the AWS service that the application will interact with. -/// -/// # Errors -/// -/// Returns a `OneIoError` if any of the following conditions are met: -/// -/// - Failed to load the dotenv file. -/// - Failed to retrieve the AWS region from the default environment. -/// - Failed to retrieve the AWS credentials from the environment. -/// -/// # Examples -/// -/// ```no_run -/// use oneio::s3_env_check; -/// -/// if let Err(e) = s3_env_check() { -/// eprintln!("Error: {:?}", e); -/// } -/// ``` pub fn s3_env_check() -> Result<(), OneIoError> { dotenvy::dotenv().ok(); let _ = Region::from_default_env()?; @@ -46,43 +22,6 @@ pub fn s3_env_check() -> Result<(), OneIoError> { } /// Parse an S3 URL into a bucket and key. -/// -/// This function takes an S3 URL as input and returns the bucket and key -/// as a tuple. The URL should be in the format "s3://bucket-name/key". -/// -/// # Arguments -/// -/// * `path` - A string slice representing the S3 URL to be parsed. -/// -/// # Examples -/// -/// ```no_run -/// use oneio::s3_url_parse; -/// -/// let result = s3_url_parse("s3://my-bucket/my-folder/my-file.txt"); -/// match result { -/// Ok((bucket, key)) => { -/// println!("Bucket: {}", bucket); -/// println!("Key: {}", key); -/// } -/// Err(err) => { -/// eprintln!("Failed to parse S3 URL: {:?}", err); -/// } -/// } -/// ``` -/// -/// # Errors -/// -/// This function can return a `OneIoError` in the following cases: -/// -/// * If the URL does not contain a bucket and key separated by "/". -/// -/// In case of error, the `OneIoError` variant `S3UrlError` will be returned, -/// containing the original URL string. -/// -/// # Returns -/// -/// Returns a `Result` containing the bucket and key as a tuple, or a `OneIoError` if parsing fails. pub fn s3_url_parse(path: &str) -> Result<(String, String), OneIoError> { let (_, remaining) = path .split_once("://") @@ -206,39 +145,11 @@ impl Read for StreamReader { } /// Creates an S3 bucket object with the specified bucket name. -/// -/// # Arguments -/// -/// * `bucket` - A string slice representing the name of the S3 bucket. -/// -/// # Errors -/// -/// This function can return a `OneIoError` if any of the following conditions occur: -/// -/// * Failed to load the environment variables from the .env file. -/// * Failed to create a new `Bucket` object with the given `bucket` name, `Region`, and `Credentials`. -/// -/// # Examples -/// -/// ```no_run -/// use s3::Bucket; -/// use oneio::s3_bucket; -/// -/// let result = s3_bucket("my-bucket"); -/// match result { -/// Ok(bucket) => { -/// // Do something with the `bucket` object -/// } -/// Err(error) => { -/// // Handle the error -/// } -/// } -/// ``` pub fn s3_bucket(bucket: &str) -> Result { dotenvy::dotenv().ok(); #[cfg(feature = "rustls")] - super::crypto::ensure_default_provider()?; + crate::crypto::ensure_default_provider()?; let mut bucket = *Bucket::new( bucket, @@ -249,40 +160,7 @@ pub fn s3_bucket(bucket: &str) -> Result { Ok(bucket) } -//noinspection ALL,Style -/// `s3_reader` function reads a file from an S3 bucket and returns a boxed reader implementing `Read` trait. -/// -/// # Arguments -/// -/// * `bucket` - A string slice that represents the name of the S3 bucket. -/// * `path` - A string slice that represents the file path within the S3 bucket. -/// -/// # Errors -/// -/// The function can return an error of type `OneIoError`. This error occurs if there are any issues with the S3 operations, such as -/// accessing the bucket or retrieving the object. -/// -/// # Returns -/// -/// The function returns a `Result` containing a boxed reader implementing `Read + Send` trait in case of a successful operation. The reader -/// can be used to read the contents of the file stored in the S3 bucket. If the operation fails, a `OneIoError` is returned as an error. -/// -/// # Example -/// -/// ```rust,no_run -/// use std::io::Read; -/// use oneio::s3_reader; -/// -/// let bucket = "my_bucket"; -/// let path = "path/to/file.txt"; -/// -/// let mut reader = s3_reader(bucket, path).unwrap(); -/// -/// let mut buffer = Vec::new(); -/// reader.read_to_end(&mut buffer).unwrap(); -/// -/// assert_eq!(buffer, b"File content in S3 bucket"); -/// ``` +/// Reads a file from an S3 bucket and returns a boxed reader implementing `Read` trait. pub fn s3_reader(bucket: &str, path: &str) -> Result, OneIoError> { let bucket = s3_bucket(bucket)?; let path = path.to_string(); @@ -308,28 +186,8 @@ pub fn s3_reader(bucket: &str, path: &str) -> Result, OneIo } /// Uploads a file to an S3 bucket at the specified path. -/// -/// # Arguments -/// -/// * `bucket` - The name of the S3 bucket. -/// * `s3_path` - The desired path of the file in the S3 bucket. -/// * `file_path` - The path of the file to be uploaded. -/// -/// # Returns -/// -/// Returns Result<(), OneIoError> indicating success or failure. -/// -/// # Examples -/// -/// ```rust,no_run -/// use oneio::s3_upload; -/// -/// let result = s3_upload("my-bucket", "path/to/file.txt", "/path/to/local_file.txt"); -/// assert!(result.is_ok()); -/// ``` pub fn s3_upload(bucket: &str, s3_path: &str, file_path: &str) -> Result<(), OneIoError> { // Early validation: check if file exists before attempting S3 operations - // This prevents potential hanging issues when file doesn't exist if !std::path::Path::new(file_path).exists() { return Err(OneIoError::Io(std::io::Error::new( std::io::ErrorKind::NotFound, @@ -338,37 +196,13 @@ pub fn s3_upload(bucket: &str, s3_path: &str, file_path: &str) -> Result<(), One } let bucket = s3_bucket(bucket)?; - let mut reader = get_reader_raw(file_path)?; + let file = std::fs::File::open(file_path)?; + let mut reader: Box = Box::new(std::io::BufReader::new(file)); bucket.put_object_stream(&mut reader, s3_path)?; Ok(()) } /// Copies an object within the same Amazon S3 bucket. -/// -/// # Arguments -/// -/// * `bucket` - The name of the Amazon S3 bucket. -/// * `s3_path` - The path of the source object to be copied. -/// * `s3_path_new` - The path of the destination object. -/// -/// # Errors -/// -/// Returns an `Err` variant of the `OneIoError` enum if there was an error when copying the object. -/// -/// # Examples -/// -/// ```no_run -/// use oneio::s3_copy; -/// -/// match s3_copy("my-bucket", "path/to/object.txt", "new-path/to/object.txt") { -/// Err(error) => { -/// println!("Failed to copy object: {:?}", error); -/// } -/// Ok(()) => { -/// println!("Object copied successfully."); -/// } -/// } -/// ``` pub fn s3_copy(bucket: &str, s3_path: &str, s3_path_new: &str) -> Result<(), OneIoError> { let bucket = s3_bucket(bucket)?; bucket.copy_object_internal(s3_path, s3_path_new)?; @@ -376,29 +210,6 @@ pub fn s3_copy(bucket: &str, s3_path: &str, s3_path_new: &str) -> Result<(), One } /// Deletes an object from an S3 bucket. -/// -/// # Arguments -/// -/// * `bucket` - The name of the S3 bucket. -/// * `s3_path` - The path to the object in the S3 bucket. -/// -/// # Errors -/// -/// Returns a `OneIoError` if the deletion fails. -/// -/// # Examples -/// -/// ```no_run -/// use oneio::{OneIoError, s3_delete}; -/// -/// fn example() -> Result<(), OneIoError> { -/// let bucket = "my-bucket"; -/// let s3_path = "path/to/object.txt"; -/// s3_delete(bucket, s3_path)?; -/// Ok(()) -/// } -/// ``` -/// pub fn s3_delete(bucket: &str, s3_path: &str) -> Result<(), OneIoError> { let bucket = s3_bucket(bucket)?; bucket.delete_object(s3_path)?; @@ -406,43 +217,9 @@ pub fn s3_delete(bucket: &str, s3_path: &str) -> Result<(), OneIoError> { } /// Downloads a file from an S3 bucket and saves it locally. -/// -/// # Arguments -/// -/// * `bucket` - The name of the S3 bucket. -/// * `s3_path` - The path to the file in the S3 bucket. -/// * `file_path` - The path where the downloaded file will be saved locally. -/// -/// # Returns -/// -/// Return `Ok(())` if the download is successful. -/// -/// Return an `Err` with a `OneIoError` if there was an error during the download. -/// -/// # Errors -/// -/// The function can return `OneIoError::Network` if the HTTP response -/// status code is not in the range of 200 to 299 (inclusive). -/// -/// # Example -/// -/// ```rust -/// use std::path::Path; -/// use oneio::s3_download; -/// -/// let bucket = "my-bucket"; -/// let s3_path = "path/to/file.txt"; -/// let file_path = "local/file.txt"; -/// -/// match s3_download(bucket, s3_path, file_path) { -/// Ok(()) => println!("Download successful!"), -/// Err(err) => println!("Error while downloading: {:?}", err), -/// } -/// ``` -/// pub fn s3_download(bucket: &str, s3_path: &str, file_path: &str) -> Result<(), OneIoError> { let bucket = s3_bucket(bucket)?; - let mut output_file = get_writer_raw(file_path)?; + let mut output_file = get_writer_raw_impl(file_path)?; let res: u16 = bucket.get_object_to_writer(s3_path, &mut output_file)?; match res { 200..=299 => Ok(()), @@ -453,39 +230,7 @@ pub fn s3_download(bucket: &str, s3_path: &str, file_path: &str) -> Result<(), O } } -/// # S3 Stats -/// /// Retrieves the head object result for a given bucket and path in Amazon S3. -/// -/// ## Parameters -/// -/// - `bucket`: A string that represents the name of the bucket in Amazon S3. -/// - `path`: A string that represents the path of the object in the bucket. -/// -/// ## Returns -/// -/// Returns a `Result` that contains a `HeadObjectResult` if the operation was -/// successful, otherwise returns a `OneIoError` indicating the S3 download error. -/// -/// ## Example -/// -/// ```rust,no_run -/// use oneio::s3_stats; -/// -/// let bucket = "my-bucket"; -/// let path = "my-folder/my-file.txt"; -/// -/// match s3_stats(bucket, path) { -/// Ok(result) => { -/// // Handle the successful result -/// println!("Head Object: {:?}", result); -/// } -/// Err(error) => { -/// // Handle the error -/// println!("Error: {:?}", error); -/// } -/// } -/// ``` pub fn s3_stats(bucket: &str, path: &str) -> Result { let bucket = s3_bucket(bucket)?; let (head_object, code): (HeadObjectResult, u16) = bucket.head_object(path)?; @@ -499,30 +244,6 @@ pub fn s3_stats(bucket: &str, path: &str) -> Result println!("File exists"), -/// Ok(false) => println!("File does not exist"), -/// Err(error) => eprintln!("Error: {:?}", error), -/// } -/// ``` pub fn s3_exists(bucket: &str, path: &str) -> Result { match s3_stats(bucket, path) { Ok(_) => Ok(true), @@ -535,40 +256,6 @@ pub fn s3_exists(bucket: &str, path: &str) -> Result { } /// Lists objects in the specified Amazon S3 bucket with given prefix and delimiter. -/// -/// # Arguments -/// -/// * `bucket` - Name of the S3 bucket. -/// * `prefix` - A prefix to filter the objects by. -/// * `delimiter` - An optional delimiter used to separate object key hierarchies. -/// * `dirs` - A flag to show only directories under the given prefix if set to true -/// -/// # Returns -/// -/// * If the URL does not start with "s3://". Returns a `Result` with a `Vec` containing the object keys on success, or a `OneIoError` on failure. -/// -/// # Example -/// -/// ```no_run -/// use oneio::s3_list; -/// -/// let bucket = "my-bucket"; -/// let prefix = "folder/"; -/// let delimiter = Some("/".to_string()); -/// -/// let result = s3_list(bucket, prefix, delimiter, false); -/// match result { -/// Ok(objects) => { -/// println!("Found {} objects:", objects.len()); -/// for object in objects { -/// println!("{}", object); -/// } -/// } -/// Err(error) => { -/// eprintln!("Failed to list objects: {:?}", error); -/// } -/// } -/// ``` pub fn s3_list( bucket: &str, prefix: &str, @@ -613,45 +300,24 @@ mod tests { #[test] fn test_s3_upload_nonexistent_file_early_validation() { - // Test for issue #48: s3_upload should fail quickly for non-existent files - // This test checks the early validation logic without requiring S3 credentials - let non_existent_file = "/tmp/oneio_test_nonexistent_file_12345.txt"; - - // Make sure the file doesn't exist let _ = std::fs::remove_file(non_existent_file); assert!(!std::path::Path::new(non_existent_file).exists()); - // This should return an error quickly due to early file validation let start = std::time::Instant::now(); - match s3_upload("test-bucket", "test-path", non_existent_file) { - Ok(_) => { - panic!("Upload should have failed for non-existent file"); - } + Ok(_) => panic!("Upload should have failed for non-existent file"), Err(OneIoError::Io(e)) => { let duration = start.elapsed(); - println!( - "✓ Upload failed quickly with IO error after {:?}: {}", - duration, e - ); assert!( duration < std::time::Duration::from_millis(100), - "Early validation should be instant. Took: {:?}", - duration + "Early validation should be instant" ); assert_eq!(e.kind(), std::io::ErrorKind::NotFound); - assert!(e.to_string().contains("File not found")); } - Err(e) => { - // Could also fail due to missing credentials, which is also quick + Err(_) => { let duration = start.elapsed(); - println!("Upload failed with error after {:?}: {:?}", duration, e); - assert!( - duration < std::time::Duration::from_secs(1), - "Should fail quickly, not hang. Took: {:?}", - duration - ); + assert!(duration < std::time::Duration::from_secs(1)); } } } diff --git a/tests/CERTIFICATES.md b/tests/CERTIFICATES.md new file mode 100644 index 0000000..46bfa3a --- /dev/null +++ b/tests/CERTIFICATES.md @@ -0,0 +1,68 @@ +# Test Certificate Generation + +This document describes how the test certificates in this directory were generated. + +## Files + +- `test-cert.pem` - Self-signed CA certificate in PEM format +- `test-cert.der` - Same certificate in DER format +- `test-key.pem` - Private key (not used in tests, generated as side effect) + +## Generation Commands + +### 1. Generate self-signed certificate and key: + +```bash +openssl req -x509 -newkey rsa:2048 \ + -keyout tests/test-key.pem \ + -out tests/test-cert.pem \ + -days 365 \ + -nodes \ + -subj "/C=US/ST=Test/L=Test/O=Test/CN=test.example.com" +``` + +Parameters: +- `-x509`: Generate self-signed certificate instead of CSR +- `-newkey rsa:2048`: Generate new RSA key with 2048 bits +- `-keyout`: Output file for private key +- `-out`: Output file for certificate +- `-days 365`: Certificate validity period (1 year) +- `-nodes`: No DES encryption (no password on private key) +- `-subj`: Subject fields for certificate + +### 2. Convert PEM to DER format: + +```bash +openssl x509 -in tests/test-cert.pem -outform DER -out tests/test-cert.der +``` + +Parameters: +- `-in`: Input PEM certificate +- `-outform DER`: Output in DER (binary) format +- `-out`: Output file + +## Certificate Details + +- **Subject**: /C=US/ST=Test/L=Test/O=Test/CN=test.example.com +- **Issuer**: Same as subject (self-signed) +- **Validity**: 1 year from generation date +- **Key Type**: RSA 2048-bit +- **Signature Algorithm**: SHA256 with RSA + +## Usage in Tests + +These certificates are used to test: +- `OneIoBuilder::add_root_certificate_pem()` +- `OneIoBuilder::add_root_certificate_der()` +- `ONEIO_CA_BUNDLE` environment variable support +- Custom TLS certificate loading for corporate proxies (e.g., Cloudflare WARP) + +## Regenerating Certificates + +If the certificate expires or you need to regenerate it: + +1. Delete the old files: `rm tests/test-cert.pem tests/test-cert.der tests/test-key.pem` +2. Run the commands above +3. Commit the new files + +Note: Since this is a self-signed test certificate, it should NOT be used for production or any real TLS connections. diff --git a/tests/basic_integration.rs b/tests/basic_integration.rs index fb9793d..e3ff178 100644 --- a/tests/basic_integration.rs +++ b/tests/basic_integration.rs @@ -5,6 +5,52 @@ use std::io::{Read, Write}; const TEST_TEXT: &str = "OneIO test file.\nThis is a test."; +#[cfg(feature = "http")] +fn spawn_http_server(request_count: usize) -> (String, std::thread::JoinHandle>) { + use std::net::TcpListener; + use std::time::Duration; + + let listener = TcpListener::bind("127.0.0.1:0").unwrap(); + let addr = listener.local_addr().unwrap(); + let body = TEST_TEXT.to_string(); + + let handle = std::thread::spawn(move || { + let mut requests = Vec::with_capacity(request_count); + for _ in 0..request_count { + let (mut stream, _) = listener.accept().unwrap(); + stream + .set_read_timeout(Some(Duration::from_secs(2))) + .unwrap(); + + let mut request = Vec::new(); + let mut buffer = [0_u8; 1024]; + loop { + let bytes_read = stream.read(&mut buffer).unwrap(); + if bytes_read == 0 { + break; + } + request.extend_from_slice(&buffer[..bytes_read]); + if request.windows(4).any(|window| window == b"\r\n\r\n") { + break; + } + } + + requests.push(String::from_utf8(request).unwrap()); + + let response = format!( + "HTTP/1.1 200 OK\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{}", + body.len(), + body + ); + stream.write_all(response.as_bytes()).unwrap(); + stream.flush().unwrap(); + } + requests + }); + + (format!("http://{addr}/test.txt"), handle) +} + fn test_read(file_path: &str) { let mut reader = oneio::get_reader(file_path).unwrap(); let mut text = "".to_string(); @@ -96,3 +142,594 @@ fn test_404_handling() { assert!(reader.is_ok()); assert!(oneio::exists("https://spaces.bgpkit.org/oneio/test_data.json").unwrap()); } + +#[cfg(feature = "http")] +#[test] +fn test_oneio_builder_reuses_default_headers() { + let (url, handle) = spawn_http_server(2); + let oneio = oneio::OneIo::builder() + .header_str("X-Test-Token", "secret") + .build() + .unwrap(); + + let first = oneio.read_to_string(&url).unwrap(); + let second = oneio.read_to_string(&url).unwrap(); + + assert_eq!(first, TEST_TEXT); + assert_eq!(second, TEST_TEXT); + + let requests = handle.join().unwrap(); + assert_eq!(requests.len(), 2); + for request in requests { + let request = request.to_ascii_lowercase(); + assert!(request.contains("x-test-token: secret")); + assert!(request.contains("user-agent: oneio")); + } +} + +#[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] +#[test] +fn test_oneio_builder_accepts_root_certificate() { + let cert_pem = br#"-----BEGIN CERTIFICATE----- +MIIDCTCCAfGgAwIBAgIUZwNjzmSANT4XyBCwC6aLzuUhsCAwDQYJKoZIhvcNAQEL +BQAwFDESMBAGA1UEAwwJbG9jYWxob3N0MB4XDTI2MDMwNjE1MjMwNloXDTI2MDMw +NzE1MjMwNlowFDESMBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEF +AAOCAQ8AMIIBCgKCAQEAnb4K2oDt8XUvD3MwNSOkfTD2Ud0vqFIsZYSnXdgw2mUT +pYW9Xs+1vdJ3IV77VCAqnvNBm2poL20xkpTQfwPrL4IWNvguAziGiWlSs573jvUe ++myRftFou3iZRl56u3evqKOgkL8CladtHYTx1ZArsKZyJJHpUMrPCMJBvcTBiAh0 +kbemeAdcnDP6PORQqW+bibYXz1pyHDGUMXUMOj5PdPV0/ayumXlr1VBnbgkLlrTd +QsJOxLVk9w7RkaLg3pvq0RGvn08up+J8FEkfK1Ddoz4nJnYJy5xgs25rIUDVfGTw +G5QBJdNZKSlXXqQXBawLGHJi7zvSV4urRFXlhfad8wIDAQABo1MwUTAdBgNVHQ4E +FgQUHx0PDPAWL4pKz3T0RVNxjjnYSyEwHwYDVR0jBBgwFoAUHx0PDPAWL4pKz3T0 +RVNxjjnYSyEwDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAAtmG +mWhz0AfDCxzulDTV4OLTWMkBpGgOlDG/bFWM0+M519t0f2yE7szaboYH+E4axCoe +ZF9zAMKgSmoyCKNnZlFs4ZqXvphNeim1Pnd4LmTbiUGxLwHXuTzwfdIfna4ACx+s +qQe3vGmM9OWcGipiA4Z84HrReW7Ht70enYYpC7CaDalTu9pRZIk/cparF8qL2QNv +OkOLHxPjJTiGWvjaZpzADT30e9SKjK1RPMBRLBUdg4wizKuliRugVYV6flquH/iY +ryXRHfGX358AcPpdZQxhuYsMRkaCKfgCXULQx4+MpoosyeoH6lPRWYeIZVIqL5wc +FZr4y1T605mmkIwGPQ== +-----END CERTIFICATE----- +"#; + + let oneio = oneio::OneIo::builder() + .add_root_certificate_pem(cert_pem) + .unwrap() + .build(); + + assert!(oneio.is_ok()); +} + +// ── file_extension ──────────────────────────────────────────────────────────── + +#[test] +fn test_file_extension_plain() { + assert_eq!(oneio::get_reader("tests/test_data.txt").is_ok(), true); +} + +#[cfg(feature = "any_gz")] +#[test] +fn test_file_extension_strips_query_params() { + // Simulate a path where the extension is followed by a query string. + // The file_extension helper must ignore everything after '?'. + // We test this by verifying get_reader on a local .gz path with a fake query + // suffix doesn't crash or misdetect the extension. + // + // Direct unit test of the internal helper via a round-trip: write a gz file, + // construct a path with query-like suffix, confirm detection still works. + let gz_path = "tests/test_data.txt.gz"; + // file_extension is pub(crate), so we test it indirectly through get_reader_with_type. + let oneio = oneio::OneIo::new().unwrap(); + // get_reader on local path ignores the query part for protocol detection, but + // compression detection is what we're verifying here via get_reader_with_type. + let result = oneio.get_reader_with_type(gz_path, "gz"); + assert!( + result.is_ok(), + "get_reader_with_type with explicit gz should succeed" + ); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +// ── OneIo::get_reader_with_type ─────────────────────────────────────────────── + +#[test] +fn test_get_reader_with_type_plain() { + let oneio = oneio::OneIo::new().unwrap(); + // Explicit empty compression = raw pass-through + let result = oneio.get_reader_with_type("tests/test_data.txt", ""); + assert!(result.is_ok()); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +#[cfg(feature = "any_gz")] +#[test] +fn test_get_reader_with_type_gz_override() { + let oneio = oneio::OneIo::new().unwrap(); + // File is .gz but we pass extension explicitly — should decompress correctly. + let result = oneio.get_reader_with_type("tests/test_data.txt.gz", "gz"); + assert!(result.is_ok()); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +#[cfg(feature = "bz")] +#[test] +fn test_get_reader_with_type_bz2_override() { + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_reader_with_type("tests/test_data.txt.bz2", "bz2"); + assert!(result.is_ok()); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +// ── OneIoBuilder: timeout and configure_http ────────────────────────────────── + +#[cfg(feature = "http")] +#[test] +fn test_builder_timeout_builds_successfully() { + use std::time::Duration; + let result = oneio::OneIo::builder() + .timeout(Duration::from_secs(30)) + .connect_timeout(Duration::from_secs(5)) + .build(); + assert!(result.is_ok()); +} + +#[cfg(feature = "http")] +#[test] +fn test_builder_configure_http_escape_hatch() { + use std::time::Duration; + // configure_http lets us set options not directly exposed by OneIoBuilder. + let result = oneio::OneIo::builder() + .configure_http(|b| b.connection_verbose(false).timeout(Duration::from_secs(10))) + .build(); + assert!(result.is_ok()); +} + +#[cfg(feature = "http")] +#[test] +fn test_builder_no_proxy_builds_successfully() { + let result = oneio::OneIo::builder().no_proxy().build(); + assert!(result.is_ok()); +} + +// ── download_with_retry ─────────────────────────────────────────────────────── + +#[cfg(feature = "http")] +#[test] +fn test_download_with_retry_succeeds_on_first_attempt() { + let (url, handle) = spawn_http_server(1); + let oneio = oneio::OneIo::new().unwrap(); + let out = "tests/test_download_retry_output.txt"; + let result = oneio.download_with_retry(&url, out, 3); + handle.join().unwrap(); + + assert!(result.is_ok()); + let content = std::fs::read_to_string(out).unwrap(); + assert_eq!(content, TEST_TEXT); + std::fs::remove_file(out).unwrap(); +} + +#[cfg(feature = "http")] +#[test] +fn test_download_with_retry_exhausts_retries_on_bad_url() { + let oneio = oneio::OneIo::new().unwrap(); + // Port 1 is reserved and will immediately refuse the connection. + let result = oneio.download_with_retry( + "http://127.0.0.1:1/no-such-file", + "tests/should_not_exist.txt", + 1, + ); + assert!(result.is_err()); + // Cleanup in case it somehow created a file. + let _ = std::fs::remove_file("tests/should_not_exist.txt"); +} + +// ── Phase 1: LZ4 / XZ / Zstd compression ───────────────────────────────────── + +#[cfg(feature = "lz")] +#[test] +fn test_local_lz4() { + test_read("tests/test_data.txt.lz4"); +} + +#[cfg(feature = "lz")] +#[test] +fn test_write_lz4() { + test_write("tests/test_write_data.txt.lz4", "tests/test_data.txt.lz4"); +} + +#[cfg(feature = "lz")] +#[test] +fn test_get_reader_with_type_lz4_override() { + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_reader_with_type("tests/test_data.txt.lz4", "lz4"); + assert!(result.is_ok()); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +#[cfg(feature = "xz")] +#[test] +fn test_local_xz() { + test_read("tests/test_data.txt.xz"); +} + +#[cfg(feature = "xz")] +#[test] +fn test_write_xz() { + test_write("tests/test_write_data.txt.xz", "tests/test_data.txt.xz"); +} + +#[cfg(feature = "xz")] +#[test] +fn test_get_reader_with_type_xz_override() { + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_reader_with_type("tests/test_data.txt.xz", "xz"); + assert!(result.is_ok()); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +#[cfg(feature = "zstd")] +#[test] +fn test_local_zstd() { + test_read("tests/test_data.txt.zst"); +} + +#[cfg(feature = "zstd")] +#[test] +fn test_write_zstd() { + test_write("tests/test_write_data.txt.zst", "tests/test_data.txt.zst"); +} + +#[cfg(feature = "zstd")] +#[test] +fn test_get_reader_with_type_zstd_override() { + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_reader_with_type("tests/test_data.txt.zst", "zst"); + assert!(result.is_ok()); + let mut content = String::new(); + result.unwrap().read_to_string(&mut content).unwrap(); + assert_eq!(content.as_str(), TEST_TEXT); +} + +// ── Phase 1: Progress tracking ──────────────────────────────────────────────── + +#[cfg(feature = "http")] +#[test] +fn test_get_reader_with_progress_fires_callback() { + use std::sync::{Arc, Mutex}; + + // get_reader_with_progress makes a HEAD request (content-length probe) + // followed by a GET request, so the server must handle 2 connections. + let (url, handle) = spawn_http_server(2); + let oneio = oneio::OneIo::new().unwrap(); + + let observed = Arc::new(Mutex::new(Vec::<(u64, u64)>::new())); + let observed_cb = Arc::clone(&observed); + + let (mut reader, total_size) = oneio + .get_reader_with_progress(&url, move |bytes_read, total_bytes| { + observed_cb.lock().unwrap().push((bytes_read, total_bytes)); + }) + .unwrap(); + + // Drain the reader so all callbacks fire. + let mut content = String::new(); + reader.read_to_string(&mut content).unwrap(); + handle.join().unwrap(); + + // Content-Length is set by spawn_http_server, so total_size must be known. + assert_eq!(total_size, Some(TEST_TEXT.len() as u64)); + + let calls = observed.lock().unwrap(); + // At least one callback must have fired. + assert!(!calls.is_empty(), "progress callback never fired"); + // Final bytes_read must equal the total content length. + let (final_bytes, _) = *calls.last().unwrap(); + assert_eq!(final_bytes, TEST_TEXT.len() as u64); + // total_bytes passed to every callback must match the content length. + for (_, total) in calls.iter() { + assert_eq!(*total, TEST_TEXT.len() as u64); + } + assert_eq!(content, TEST_TEXT); +} + +#[test] +fn test_get_reader_with_progress_local_no_total() { + use std::sync::{Arc, Mutex}; + + // Local files don't go through get_content_length HTTP path — + // total_bytes should be known from fs::metadata, total_size Some. + let oneio = oneio::OneIo::new().unwrap(); + let observed = Arc::new(Mutex::new(0u64)); + let observed_cb = Arc::clone(&observed); + + let (mut reader, total_size) = oneio + .get_reader_with_progress("tests/test_data.txt", move |bytes_read, _| { + *observed_cb.lock().unwrap() = bytes_read; + }) + .unwrap(); + + let mut content = String::new(); + reader.read_to_string(&mut content).unwrap(); + + assert_eq!(content, TEST_TEXT); + // Local file size is known from metadata. + assert!(total_size.is_some()); + assert_eq!(total_size.unwrap(), TEST_TEXT.len() as u64); + assert_eq!(*observed.lock().unwrap(), TEST_TEXT.len() as u64); +} + +// ── Phase 1: Cache reader ───────────────────────────────────────────────────── + +#[test] +fn test_cache_reader_creates_cache_file() { + let cache_dir = "tests/tmp_cache_create"; + let cache_file = "cached.txt"; + let cache_path = format!("{cache_dir}/{cache_file}"); + // Clean up before test. + let _ = std::fs::remove_dir_all(cache_dir); + + let oneio = oneio::OneIo::new().unwrap(); + let mut reader = oneio + .get_cache_reader( + "tests/test_data.txt", + cache_dir, + Some(cache_file.to_string()), + false, + ) + .unwrap(); + + let mut content = String::new(); + reader.read_to_string(&mut content).unwrap(); + assert_eq!(content, TEST_TEXT); + + // Cache file must exist after the first read. + assert!(std::path::Path::new(&cache_path).exists()); + std::fs::remove_dir_all(cache_dir).unwrap(); +} + +#[test] +fn test_cache_reader_reuses_existing_cache() { + let cache_dir = "tests/tmp_cache_reuse"; + let cache_file = "cached.txt"; + let _ = std::fs::remove_dir_all(cache_dir); + std::fs::create_dir_all(cache_dir).unwrap(); + + // Pre-populate the cache with different content. + let cached_content = "cached content"; + std::fs::write(format!("{cache_dir}/{cache_file}"), cached_content).unwrap(); + + let oneio = oneio::OneIo::new().unwrap(); + // force_cache=false → must read from the pre-existing cache, not the source. + let mut reader = oneio + .get_cache_reader( + "tests/test_data.txt", + cache_dir, + Some(cache_file.to_string()), + false, + ) + .unwrap(); + + let mut content = String::new(); + reader.read_to_string(&mut content).unwrap(); + assert_eq!( + content, cached_content, + "should have read from cache, not source" + ); + std::fs::remove_dir_all(cache_dir).unwrap(); +} + +#[test] +fn test_cache_reader_force_refreshes_cache() { + let cache_dir = "tests/tmp_cache_force"; + let cache_file = "cached.txt"; + let _ = std::fs::remove_dir_all(cache_dir); + std::fs::create_dir_all(cache_dir).unwrap(); + + // Pre-populate the cache with stale content. + std::fs::write(format!("{cache_dir}/{cache_file}"), "stale content").unwrap(); + + let oneio = oneio::OneIo::new().unwrap(); + // force_cache=true → must re-fetch from source and overwrite cache. + let mut reader = oneio + .get_cache_reader( + "tests/test_data.txt", + cache_dir, + Some(cache_file.to_string()), + true, + ) + .unwrap(); + + let mut content = String::new(); + reader.read_to_string(&mut content).unwrap(); + assert_eq!(content, TEST_TEXT, "should have re-fetched from source"); + + // Cache file on disk must also be updated. + let on_disk = std::fs::read_to_string(format!("{cache_dir}/{cache_file}")).unwrap(); + assert_eq!(on_disk, TEST_TEXT); + std::fs::remove_dir_all(cache_dir).unwrap(); +} + +#[test] +fn test_cache_reader_creates_missing_cache_dir() { + // The cache directory must not exist before the call. + let cache_dir = "tests/tmp_cache_dir_creation/nested/path"; + let _ = std::fs::remove_dir_all("tests/tmp_cache_dir_creation"); + + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_cache_reader("tests/test_data.txt", cache_dir, None, false); + assert!(result.is_ok(), "should create nested cache directory"); + std::fs::remove_dir_all("tests/tmp_cache_dir_creation").unwrap(); +} + +// ── Phase 1: JSON parsing ───────────────────────────────────────────────────── + +#[cfg(feature = "json")] +#[test] +fn test_read_json_struct_local() { + use serde::Deserialize; + + #[derive(Deserialize, PartialEq, Debug)] + struct TestData { + name: String, + value: u32, + enabled: bool, + items: Vec, + } + + let result = oneio::read_json_struct::("tests/test_data.json"); + assert!( + result.is_ok(), + "read_json_struct failed: {:?}", + result.err() + ); + let data = result.unwrap(); + assert_eq!(data.name, "oneio_test"); + assert_eq!(data.value, 42); + assert!(data.enabled); + assert_eq!(data.items, vec!["alpha", "beta", "gamma"]); +} + +#[cfg(feature = "json")] +#[test] +fn test_read_json_struct_invalid_returns_error() { + // A plain text file is not valid JSON — must return an error, not panic. + let result = oneio::read_json_struct::("tests/test_data.txt"); + assert!(result.is_err()); +} + +// ── Phase 1: Content length ─────────────────────────────────────────────────── + +#[test] +fn test_get_content_length_local_file() { + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_content_length("tests/test_data.txt"); + assert!(result.is_ok()); + assert_eq!(result.unwrap(), TEST_TEXT.len() as u64); +} + +#[cfg(feature = "http")] +#[test] +fn test_get_content_length_http_with_content_length_header() { + let (url, handle) = spawn_http_server(1); + let oneio = oneio::OneIo::new().unwrap(); + // spawn_http_server sends Content-Length, so we must get it back. + let result = oneio.get_content_length(&url); + handle.join().unwrap(); + assert!(result.is_ok()); + assert_eq!(result.unwrap(), TEST_TEXT.len() as u64); +} + +// ── Phase 2: get_writer_raw ─────────────────────────────────────────────────── + +#[test] +fn test_get_writer_raw_creates_uncompressed_file() { + let path = "tests/tmp_writer_raw.txt"; + let oneio = oneio::OneIo::new().unwrap(); + + { + let mut writer = oneio.get_writer_raw(path).unwrap(); + writer.write_all(TEST_TEXT.as_bytes()).unwrap(); + } + + // File must be readable as plain text (no compression wrapper). + let content = std::fs::read_to_string(path).unwrap(); + assert_eq!(content, TEST_TEXT); + std::fs::remove_file(path).unwrap(); +} + +#[test] +fn test_get_writer_raw_creates_parent_dirs() { + let path = "tests/tmp_writer_raw_nested/subdir/out.txt"; + let _ = std::fs::remove_dir_all("tests/tmp_writer_raw_nested"); + + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_writer_raw(path); + assert!(result.is_ok(), "get_writer_raw should create parent dirs"); + std::fs::remove_dir_all("tests/tmp_writer_raw_nested").unwrap(); +} + +// ── Phase 2: SHA256 digest ──────────────────────────────────────────────────── + +#[cfg(feature = "digest")] +#[test] +fn test_get_sha256_digest_known_file() { + // Known SHA256 of tests/test_data.txt (pre-computed with sha256sum). + const EXPECTED: &str = "51a6f9bf51d9e6243fe838242bb74e6e16f77c87cae138b9f3e065c173fc63c7"; + let result = oneio::get_sha256_digest("tests/test_data.txt"); + assert!(result.is_ok()); + assert_eq!(result.unwrap(), EXPECTED); +} + +#[cfg(feature = "digest")] +#[test] +fn test_get_sha256_digest_missing_file_returns_error() { + let result = oneio::get_sha256_digest("tests/does_not_exist.txt"); + assert!(result.is_err()); +} + +// ── Phase 2: Error variants ─────────────────────────────────────────────────── + +// Note: reqwest::Certificate::from_pem/from_der do not validate certificate data +// at parse time. They only validate when used in a TLS connection. Therefore, +// we cannot test for InvalidCertificate errors with invalid data here. + +#[cfg(feature = "http")] +#[test] +fn test_network_error_on_refused_connection() { + // Port 1 is reserved and always refuses connections — produces a network error. + let oneio = oneio::OneIo::new().unwrap(); + let result = oneio.get_reader("http://127.0.0.1:1/file.txt"); + assert!(result.is_err()); + // Error display must be non-empty and useful. + assert!(!result.err().unwrap().to_string().is_empty()); +} + +// ── Phase 2: Environment variables ─────────────────────────────────────────── + +#[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] +#[test] +fn test_oneio_ca_bundle_env_var_valid_path() { + // Point ONEIO_CA_BUNDLE at a known PEM cert — builder must succeed. + std::env::set_var("ONEIO_CA_BUNDLE", "tests/test-cert.pem"); + let result = oneio::OneIo::builder().build(); + std::env::remove_var("ONEIO_CA_BUNDLE"); + assert!( + result.is_ok(), + "builder failed with valid ONEIO_CA_BUNDLE: {:?}", + result.err() + ); +} + +#[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] +#[test] +fn test_oneio_ca_bundle_env_var_missing_path() { + // A non-existent path must be silently ignored (not panic or error). + std::env::set_var("ONEIO_CA_BUNDLE", "/tmp/oneio_does_not_exist_ca.pem"); + let result = oneio::OneIo::builder().build(); + std::env::remove_var("ONEIO_CA_BUNDLE"); + assert!( + result.is_ok(), + "builder should ignore missing ONEIO_CA_BUNDLE" + ); +} + +#[cfg(all(feature = "http", any(feature = "rustls", feature = "native-tls")))] +#[test] +fn test_oneio_accept_invalid_certs_env_var() { + // Builder must succeed when env var is set to "true". + std::env::set_var("ONEIO_ACCEPT_INVALID_CERTS", "true"); + let result = oneio::OneIo::builder().build(); + std::env::remove_var("ONEIO_ACCEPT_INVALID_CERTS"); + assert!(result.is_ok()); +} diff --git a/tests/test-cert.der b/tests/test-cert.der new file mode 100644 index 0000000..e4499b9 Binary files /dev/null and b/tests/test-cert.der differ diff --git a/tests/test-cert.pem b/tests/test-cert.pem new file mode 100644 index 0000000..0ff8334 --- /dev/null +++ b/tests/test-cert.pem @@ -0,0 +1,21 @@ +-----BEGIN CERTIFICATE----- +MIIDizCCAnOgAwIBAgIUDA0MWC/7mS4ZLvBML9QQXQ7+PrUwDQYJKoZIhvcNAQEL +BQAwVTELMAkGA1UEBhMCVVMxDTALBgNVBAgMBFRlc3QxDTALBgNVBAcMBFRlc3Qx +DTALBgNVBAoMBFRlc3QxGTAXBgNVBAMMEHRlc3QuZXhhbXBsZS5jb20wHhcNMjYw +MzI4MDMzMDMwWhcNMjcwMzI4MDMzMDMwWjBVMQswCQYDVQQGEwJVUzENMAsGA1UE +CAwEVGVzdDENMAsGA1UEBwwEVGVzdDENMAsGA1UECgwEVGVzdDEZMBcGA1UEAwwQ +dGVzdC5leGFtcGxlLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB +ALO4T/yyNVlKDThMN7U0Dt8A9KmxXbvXzGZOENSHWx1uyHH8iaKhnmOQlfEhNy31 +orkWzWFsZyaaNZCQYQXA2CleknDCyG/rMOgn24p3ehXobIV/fz2ypKzOHPuCpb6T +inasdugdsG7HA1VdJA0jqEFimypCQFbISnmhxUISiyovBWwFvpdvtGGo3E1TGtrz +VM15mr3BEace1R8lbLn7b5w19FKC6yA6uSAGa14B9xPLPnVipu/1m929Vacmgy8e +y8dhiGX+KpPb9WJx0L/5giBwMpeuvmZ1fINUZLy1WiBhhQivrXFtpPDh6HUq8Kuz +wQhgvBz7Tv/+ZsQZkNfo/w0CAwEAAaNTMFEwHQYDVR0OBBYEFA6ohMwwL9f/rIpe +5UrHq4vCdxdRMB8GA1UdIwQYMBaAFA6ohMwwL9f/rIpe5UrHq4vCdxdRMA8GA1Ud +EwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEBAENsO66ATuRixiwqZUi1m32R +UbmYCRB30ymOWElNNuBFGrCSq8ZZ8kd8U2T/5I6lkV/rVT8/I302JW/B7vjq09Bn +cK977Ney5Cyi3OE/3G3RfHpXKzkmR4maPJKedRJ0baitC6NAjJnCg0nks35ypmnL +K6GhX+JVmzPBEk51FWs1oPUktYVkFreZFusXpoFwyCHPqe2fNgVQYpFA3AbFyFV8 +ZsASsG670jrWpixvjx3V78k2GeM7EXjs+imoP465xWoo+Jw02YAwm3SfbTBrcRcC +b/knXtcLvylGfXHg5iGwr4C71H/dOeGNc0E36YT8G1d1gB65umdeVQa7kqQopLY= +-----END CERTIFICATE----- diff --git a/tests/test-key.pem b/tests/test-key.pem new file mode 100644 index 0000000..f7e5dff --- /dev/null +++ b/tests/test-key.pem @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQCzuE/8sjVZSg04 +TDe1NA7fAPSpsV2718xmThDUh1sdbshx/ImioZ5jkJXxITct9aK5Fs1hbGcmmjWQ +kGEFwNgpXpJwwshv6zDoJ9uKd3oV6GyFf389sqSszhz7gqW+k4p2rHboHbBuxwNV +XSQNI6hBYpsqQkBWyEp5ocVCEosqLwVsBb6Xb7RhqNxNUxra81TNeZq9wRGnHtUf +JWy5+2+cNfRSgusgOrkgBmteAfcTyz51Yqbv9ZvdvVWnJoMvHsvHYYhl/iqT2/Vi +cdC/+YIgcDKXrr5mdXyDVGS8tVogYYUIr61xbaTw4eh1KvCrs8EIYLwc+07//mbE +GZDX6P8NAgMBAAECggEAIZayeFd0Kg5xfxKbnMWwe9RJZces7ZUM8hfoBFiR2mPg +mZEpxhfvqssUgv8CuZe9hFszImCovKm4lh6ty2352G2A8Asve99YgiMk4HGpAxwv +4BlulgDjzxosh4Mo5//aMojUJliaJ3IReOxR9xSxdxrvRgwgyzde1/G4lLszgDv9 +R/B7k8A3/5e7gtlli35uq/N1rusLsUvR/DTTZx4bgiC9knVr8ibVKHjOBPylTMCe +mBLTV1Y6yIQ2KilWoLtB/Obc25SxkjxS4PMg44o02MA6yudmHOfyD6jRdUfwngEO +3H2CHcwpVgyxRAG1/cbBKcwznTc8BNy5aTrJ+c4fEQKBgQD8fqGG4tk5PP6MNhfU +O6okZ2jSQObHxIISaeZEFOvjgBB/bgWLy4yN1Y7A5+We738Nr61ZqXs2yohmKRMU +lkOH059VCjoi9j9Ed4UwUPX51tMUiq6UDufiz/GFZjxe0TNsD44o9j0y0XuqO5jA +ENQWNXdUyjqm4f8Ewb3VUrF9JQKBgQC2NwoVcdCe/dZ2iRl4s797hy3/0F3K2D7T +6JfA+AvIVx9UHcnkJD9+kAkc3V3Ec02ix9TPqOhZlcW9sG+GK+IzJ8BzM0UZBE8Y +dAAOnh8fmQYO18YCZQ27USgDm8qRqIlcy1nfnznz5mcv/Mr3VntopCuo++SnJ1se +teO9A0a5yQKBgBkuZ92T8iYLNze0jVvVN3q59XJClsGFj5NWjGhtzbi6pOqQhm0S +t8nNapdCiVv4dqteweMXS/qC6GeH4SdEdSeuq01H39RsypKBeP8AlAd+rT53NkH9 +2pJ8OpIbSBoMsxwr0Pm01RWkXrjziKBfDUfJvUTvhjnugAhxagVEYf8hAoGAQ89F +xGc/sPtcBW7N+NpDpy/OUYX0cZl6ABht6RUmJV8uvHn9eRZ4PpxRl9lVnNcgDDT7 +oHAGT9hp50ayqibInirr9rMcwsuUt8/reYUkpfinhXyNJrT+oDf6gSVIBF7JB6xx +fMPfqEGVIwmbauJXOMDnspJTLFV8FniwENYYSukCgYAAqsPt7VkFnLPbItdJWm67 +Kpxdz+ogeWcPGk08ur6Ypu1jXMwokIKeKtbs/Q5NbeVDyDrkYMw++bR3rwCGV66o +tMJXy+GNxo40/vovsI/AloViNOVX7KzwsxNEWNCUfp21VMzyCoNVPuequoxG6aTk +uTmr+Cl/7B9hY5Z8HoYYpw== +-----END PRIVATE KEY----- diff --git a/tests/test_data.json b/tests/test_data.json new file mode 100644 index 0000000..7467c0e --- /dev/null +++ b/tests/test_data.json @@ -0,0 +1,6 @@ +{ + "name": "oneio_test", + "value": 42, + "enabled": true, + "items": ["alpha", "beta", "gamma"] +} \ No newline at end of file