diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 2bf9e4d..4a87909 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -51,9 +51,9 @@ The scaffold is functional with image display, async OCR pipeline, drag-select o
 - Drag-select overlay with word highlighting
 - Ctrl+C clipboard copy
 - Stale OCR result cancellation via monotonic job IDs
+- Zoom & pan (Ctrl+scroll, pinch, +/- keys, middle-drag pan) via custom `ZoomableCanvas` widget
 
 ### What's not implemented yet:
-- Zoom and pan
 - Info panel (filename, dimensions, file size)
 - Context menu (right-click copy)
 - OCR caching (cache module exists but is not wired up)
@@ -81,5 +81,6 @@ cargo test --all
 - Never block the GTK main thread — all OCR and I/O runs on background threads
 - Use async-channel to send results back to the UI thread
 - quickview-core must have zero GTK dependencies (keeps it testable without a display server)
-- Coordinate transforms go through `compute_contain_transform()` — image coords vs widget coords
+- Coordinate transforms go through `ViewTransform::from_center()` and related methods in `geometry.rs` — image coords vs widget coords. Fields are private; use `.scale()`, `.offset_x()`, `.offset_y()` getters. `contain()` returns `ContainResult`.
 - OCR results use image-space coordinates; convert to widget-space only for rendering
+- OCR hit-testing uses `OcrWordIndex` spatial index (`ocr/index.rs`) for efficient drag-select queries
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 072d583..c45730a 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -32,6 +32,7 @@ jobs:
             gtk4 libadwaita \
             tesseract tesseract-data-eng \
             gtk4-layer-shell
+          pkg-config --atleast-version=4.10 gtk4
           rustup default stable
 
       - name: Cache cargo registry and build artifacts
@@ -74,6 +75,7 @@ jobs:
             gtk4 libadwaita \
             tesseract tesseract-data-eng \
             gtk4-layer-shell
+          pkg-config --atleast-version=4.10 gtk4
           rustup default stable
 
       - name: Cache cargo registry, build artifacts, and tools
diff --git a/AGENTS.md b/AGENTS.md
index 18717e5..a6333d2 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -12,7 +12,7 @@ Primary target is Arch Linux + Wayland (wlroots compositors like Sway/Hyprland/n
 ## Repo Layout
 
 - `crates/quickview/`: CLI entrypoint (`quickview` binary).
-- `crates/quickview-core/`: non-GTK core (OCR parsing, geometry, selection logic, cache helpers).
+- `crates/quickview-core/`: non-GTK core (OCR parsing, geometry/ViewTransform, spatial index, selection logic, cache helpers).
 - `crates/quickview-ui/`: GTK4/libadwaita UI (full viewer + quick preview windows, overlay widget).
 - `docs/`: phased plan, architecture, decisions, development notes.
 - `adrs/`: deeper architecture decisions.
@@ -83,7 +83,7 @@ GitHub Actions runs in an `archlinux:latest` container and installs system packa
 - Quick Preview window: `crates/quickview-ui/src/windows/quick_preview.rs`
 - Full viewer window: `crates/quickview-ui/src/windows/full_viewer.rs`
 - Viewer controller (loads images, kicks OCR, ignores late results): `crates/quickview-ui/src/windows/shared.rs`
-- Overlay + drag selection rendering: `crates/quickview-ui/src/widgets/image_overlay.rs`
+- Image rendering, zoom/pan, drag selection, OCR overlay: `crates/quickview-ui/src/widgets/image_overlay.rs` (contains `ImageOverlayWidget` wrapper + `ZoomableCanvas` custom widget subclass)
 - Tesseract invocation + TSV parsing: `crates/quickview-core/src/ocr/`
 
 ## Project Invariants (Don't Break These)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 27ef7d8..c1bffa8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,5 +2,15 @@
 
 ## Unreleased
 
-- Initial scaffold.
+### Added
+- **Zoom & pan** — Ctrl+scroll zoom (anchored at cursor), pinch-to-zoom,
+  `+`/`-` keyboard zoom, `0`/`Home` reset to fit-to-window. Middle-click drag
+  or Ctrl+left-drag to pan. Works in both Full Viewer and Quick Preview.
+  Selection and OCR highlights stay aligned at all zoom levels.
+- **Spatial index for OCR hit-testing** — `OcrWordIndex` uniform-grid index
+  replaces linear scan during drag-select for faster word lookup.
+- **ViewTransform hardening** — validated constructor rejects non-finite and
+  non-positive scale values; fields are now private with getters.
+- **CI GTK4 version check** — `pkg-config --atleast-version=4.10 gtk4` in CI.
 
+- Initial scaffold.
diff --git a/README.md b/README.md
index 1a82d55..b691168 100644
--- a/README.md
+++ b/README.md
@@ -35,11 +35,13 @@ quickview --quick-preview photo.png
 quickview photo.png
 ```
 
-**OCR Text Selection** — Tesseract runs asynchronously after the image loads. Drag to select recognized words, `Ctrl+C` to copy.
+**Zoom & Pan** — `Ctrl+scroll` to zoom at cursor, pinch-to-zoom on touchpad, `+`/`-` keys, `0` to reset. Middle-click drag or `Ctrl+left-drag` to pan.
+
+**OCR Text Selection** — Tesseract runs asynchronously after the image loads. Drag to select recognized words, `Ctrl+C` to copy. Selection stays aligned at any zoom level.
 
 ## Requirements
 
-Arch Linux (primary target):
+Arch Linux (primary target, requires GTK4 >= 4.10):
 
 ```bash
 sudo pacman -S --needed \
diff --git a/crates/quickview-core/src/geometry.rs b/crates/quickview-core/src/geometry.rs
index 3d34af2..c2fda37 100644
--- a/crates/quickview-core/src/geometry.rs
+++ b/crates/quickview-core/src/geometry.rs
@@ -1,5 +1,7 @@
 use serde::{Deserialize, Serialize};
 
+use std::fmt;
+
 #[derive(Debug, Default, Clone, Copy, PartialEq, Serialize, Deserialize)]
 pub struct Point {
     pub x: f64,
@@ -41,3 +43,295 @@ impl Rect {
         self.x < bx2 && ax2 > other.x && self.y < by2 && ay2 > other.y
     }
 }
+
+/// Result of `ViewTransform::contain()`.
+///
+/// This represents the baseline "fit to widget" (contain) scale and the widget-space
+/// center point used by `ViewTransform::from_center()`.
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub struct ContainResult {
+    /// Uniform scale that fits the entire image inside the widget.
+    pub contain_scale: f64,
+
+    /// Center of the widget in widget coordinates (pixels).
+    pub widget_center: Point,
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum ViewTransformError {
+    NonFinite,
+    NonPositiveScale,
+}
+
+impl fmt::Display for ViewTransformError {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            ViewTransformError::NonFinite => write!(f, "non-finite view transform value"),
+            ViewTransformError::NonPositiveScale => write!(f, "scale must be > 0"),
+        }
+    }
+}
+
+impl std::error::Error for ViewTransformError {}
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub struct ViewTransform {
+    scale: f64,
+    offset_x: f64,
+    offset_y: f64,
+}
+
+impl ViewTransform {
+    pub fn new(scale: f64, offset_x: f64, offset_y: f64) -> Result<Self, ViewTransformError> {
+        if !scale.is_finite() || !offset_x.is_finite() || !offset_y.is_finite() {
+            return Err(ViewTransformError::NonFinite);
+        }
+        if scale <= 0.0 {
+            return Err(ViewTransformError::NonPositiveScale);
+        }
+        Ok(Self {
+            scale,
+            offset_x,
+            offset_y,
+        })
+    }
+
+    pub fn scale(&self) -> f64 {
+        self.scale
+    }
+
+    pub fn offset_x(&self) -> f64 {
+        self.offset_x
+    }
+
+    pub fn offset_y(&self) -> f64 {
+        self.offset_y
+    }
+
+    /// Compute the baseline "contain" (fit-to-widget) scale.
+    ///
+    /// The returned `widget_center` is in widget coordinates (pixels) and is the point
+    /// that `from_center()` treats as the widget's visual center anchor.
+    pub fn contain(widget_w: f64, widget_h: f64, image_w: f64, image_h: f64) -> ContainResult {
+        let widget_center = Point {
+            x: widget_w.max(0.0) * 0.5,
+            y: widget_h.max(0.0) * 0.5,
+        };
+
+        if widget_w <= 0.0 || widget_h <= 0.0 || image_w <= 0.0 || image_h <= 0.0 {
+            return ContainResult {
+                contain_scale: 1.0,
+                widget_center,
+            };
+        }
+
+        let contain_scale = (widget_w / image_w)
+            .min(widget_h / image_h)
+            .max(f64::MIN_POSITIVE);
+        ContainResult {
+            contain_scale,
+            widget_center,
+        }
+    }
+
+    /// Construct a `ViewTransform` from canonical view state.
+    ///
+    /// `center_img.x` and `center_img.y` must be finite. This function delegates validation
+    /// to `ViewTransform::new` and will panic if invariants are violated.
+    pub fn from_center(
+        widget_w: f64,
+        widget_h: f64,
+        image_w: f64,
+        image_h: f64,
+        zoom_factor: f64,
+        center_img: Point,
+    ) -> Self {
+        // `from_center()` delegates invariants to `ViewTransform::new`.
+        // `center_img.x` / `center_img.y` must be finite or `ViewTransform::new` will error.
+        debug_assert!(
+            center_img.x.is_finite() && center_img.y.is_finite(),
+            "from_center: center_img must be finite (x={}, y={})",
+            center_img.x,
+            center_img.y
+        );
+
+        let contain = Self::contain(widget_w, widget_h, image_w, image_h);
+        let scale =
+            (contain.contain_scale * zoom_factor.max(f64::MIN_POSITIVE)).max(f64::MIN_POSITIVE);
+
+        let offset_x = contain.widget_center.x - center_img.x * scale;
+        let offset_y = contain.widget_center.y - center_img.y * scale;
+        Self::new(scale, offset_x, offset_y).expect("ViewTransform invariants violated")
+    }
+
+    pub fn image_to_widget(&self, point: Point) -> Point {
+        Point {
+            x: self.offset_x + point.x * self.scale,
+            y: self.offset_y + point.y * self.scale,
+        }
+    }
+
+    pub fn widget_to_image(&self, point: Point) -> Point {
+        Point {
+            x: (point.x - self.offset_x) / self.scale,
+            y: (point.y - self.offset_y) / self.scale,
+        }
+    }
+
+    pub fn image_rect_to_widget(&self, rect: Rect) -> Rect {
+        Rect {
+            x: self.offset_x + rect.x * self.scale,
+            y: self.offset_y + rect.y * self.scale,
+            w: rect.w * self.scale,
+            h: rect.h * self.scale,
+        }
+    }
+
+    pub fn widget_rect_to_image(&self, rect: Rect) -> Rect {
+        Rect {
+            x: (rect.x - self.offset_x) / self.scale,
+            y: (rect.y - self.offset_y) / self.scale,
+            w: rect.w / self.scale,
+            h: rect.h / self.scale,
+        }
+    }
+
+    pub fn clamp_center(
+        widget_w: f64,
+        widget_h: f64,
+        image_w: f64,
+        image_h: f64,
+        scale: f64,
+        center_img: Point,
+    ) -> Point {
+        if widget_w <= 0.0 || widget_h <= 0.0 || image_w <= 0.0 || image_h <= 0.0 || scale <= 0.0 {
+            return center_img;
+        }
+
+        let half_view_w = widget_w / (2.0 * scale);
+        let half_view_h = widget_h / (2.0 * scale);
+
+        let center_x = if image_w * scale <= widget_w {
+            image_w * 0.5
+        } else {
+            center_img.x.clamp(half_view_w, image_w - half_view_w)
+        };
+
+        let center_y = if image_h * scale <= widget_h {
+            image_h * 0.5
+        } else {
+            center_img.y.clamp(half_view_h, image_h - half_view_h)
+        };
+
+        Point {
+            x: center_x,
+            y: center_y,
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::{Point, ViewTransform};
+
+    fn approx_eq(a: f64, b: f64, eps: f64) {
+        assert!((a - b).abs() <= eps, "{a} != {b} (eps={eps})");
+    }
+
+    #[test]
+    fn image_and_widget_mapping_are_inverse() {
+        let t = ViewTransform::from_center(
+            1200.0,
+            800.0,
+            2400.0,
+            1600.0,
+            2.25,
+            Point { x: 900.0, y: 600.0 },
+        );
+
+        let img = Point {
+            x: 1234.5,
+            y: 345.25,
+        };
+        let widget = t.image_to_widget(img);
+        let roundtrip = t.widget_to_image(widget);
+
+        approx_eq(roundtrip.x, img.x, 1e-9);
+        approx_eq(roundtrip.y, img.y, 1e-9);
+    }
+
+    #[test]
+    fn anchor_preserving_zoom_keeps_widget_anchor_fixed() {
+        let widget_w = 1000.0;
+        let widget_h = 700.0;
+        let image_w = 3000.0;
+        let image_h = 2000.0;
+
+        let center_start = Point {
+            x: 1300.0,
+            y: 900.0,
+        };
+        let zoom_start = 1.3;
+        let zoom_new = 2.1;
+        let anchor_widget = Point { x: 120.0, y: 520.0 };
+
+        let t_start = ViewTransform::from_center(
+            widget_w,
+            widget_h,
+            image_w,
+            image_h,
+            zoom_start,
+            center_start,
+        );
+        let anchor_img = t_start.widget_to_image(anchor_widget);
+
+        let t_new_unclamped = ViewTransform::from_center(
+            widget_w,
+            widget_h,
+            image_w,
+            image_h,
+            zoom_new,
+            center_start,
+        );
+        let contain = ViewTransform::contain(widget_w, widget_h, image_w, image_h);
+        let widget_center = contain.widget_center;
+        let center_new = Point {
+            x: anchor_img.x - (anchor_widget.x - widget_center.x) / t_new_unclamped.scale(),
+            y: anchor_img.y - (anchor_widget.y - widget_center.y) / t_new_unclamped.scale(),
+        };
+        let t_new =
+            ViewTransform::from_center(widget_w, widget_h, image_w, image_h, zoom_new, center_new);
+
+        let mapped_anchor = t_new.image_to_widget(anchor_img);
+        approx_eq(mapped_anchor.x, anchor_widget.x, 1e-9);
+        approx_eq(mapped_anchor.y, anchor_widget.y, 1e-9);
+    }
+
+    #[test]
+    fn clamp_center_forces_image_center_when_scaled_image_fits() {
+        let center =
+            ViewTransform::clamp_center(1000.0, 800.0, 300.0, 200.0, 2.0, Point { x: 0.0, y: 0.0 });
+        approx_eq(center.x, 150.0, 1e-9);
+        approx_eq(center.y, 100.0, 1e-9);
+    }
+
+    #[test]
+    fn clamp_center_limits_pan_when_scaled_image_exceeds_viewport() {
+        let center = ViewTransform::clamp_center(
+            1000.0,
+            700.0,
+            3000.0,
+            2000.0,
+            0.6,
+            Point {
+                x: -5000.0,
+                y: 5000.0,
+            },
+        );
+
+        // half_view_w = 1000 / (2 * 0.6) = 833.333...
+        // half_view_h = 700 / (2 * 0.6) = 583.333...
+        approx_eq(center.x, 833.3333333333334, 1e-9);
+        approx_eq(center.y, 1416.6666666666667, 1e-9);
+    }
+}
diff --git a/crates/quickview-core/src/ocr/index.rs b/crates/quickview-core/src/ocr/index.rs
new file mode 100644
index 0000000..90f92ad
--- /dev/null
+++ b/crates/quickview-core/src/ocr/index.rs
@@ -0,0 +1,305 @@
+use crate::geometry::Rect;
+
+use super::models::OcrWord;
+
+const DEFAULT_CELL_SIZE: f64 = 256.0;
+
+/// A simple uniform-grid spatial index for OCR word bounding boxes.
+///
+/// The index is built in image coordinates and can be queried with a rectangle to
+/// efficiently find intersecting words.
+///
+/// # Contract
+/// This index stores buckets of *indices* into a specific OCR word list. Callers must
+/// rebuild the index whenever the underlying `words` slice changes, including:
+/// - replacing the OCR result
+/// - reordering words
+/// - mutating any word bounding boxes
+///
+/// Calling `query_intersecting()` with a different `words` slice than the one used to
+/// build the index may produce incorrect results.
+#[derive(Debug, Clone)]
+pub struct OcrWordIndex {
+    cell_size: f64,
+    grid_w: usize,
+    grid_h: usize,
+    buckets: Vec<Vec<usize>>,
+    seen: Vec<u32>,
+    seen_gen: u32,
+}
+
+impl OcrWordIndex {
+    pub fn build(words: &[OcrWord], image_w: f64, image_h: f64) -> Self {
+        Self::build_with_cell_size(words, image_w, image_h, DEFAULT_CELL_SIZE)
+    }
+
+    pub fn build_with_cell_size(
+        words: &[OcrWord],
+        image_w: f64,
+        image_h: f64,
+        cell_size: f64,
+    ) -> Self {
+        let cell_size = cell_size.max(1.0);
+
+        let grid_w = ((image_w.max(1.0) / cell_size).ceil() as usize).max(1);
+        let grid_h = ((image_h.max(1.0) / cell_size).ceil() as usize).max(1);
+        let mut buckets = vec![Vec::<usize>::new(); grid_w.saturating_mul(grid_h).max(1)];
+
+        for (idx, w) in words.iter().enumerate() {
+            Self::insert_bbox(&mut buckets, grid_w, grid_h, cell_size, idx, w.bbox);
+        }
+
+        Self {
+            cell_size,
+            grid_w,
+            grid_h,
+            buckets,
+            seen: vec![0; words.len()],
+            seen_gen: 1,
+        }
+    }
+
+    /// Return indices of words whose bounding boxes intersect `rect`.
+    ///
+    /// `words` must be the same word list used when building this index (same ordering and
+    /// bounding boxes). If you swap or mutate the word list, rebuild via `OcrWordIndex::build(...)`
+    /// before calling this method again.
+    pub fn query_intersecting(&mut self, words: &[OcrWord], rect: &Rect) -> Vec<usize> {
+        if words.is_empty() {
+            return Vec::new();
+        }
+
+        if self.seen.len() != words.len() {
+            // Best-effort hygiene. The index must be rebuilt when `words` changes; this is only
+            // to avoid panics from the internal dedupe vector length drifting.
+            self.seen = vec![0; words.len()];
+            self.seen_gen = 1;
+        }
+
+        let Some((x0, y0, x1, y1)) =
+            Self::cell_range(self.cell_size, self.grid_w, self.grid_h, rect)
+        else {
+            return Vec::new();
+        };
+
+        let gen = self.next_seen_gen();
+        let mut out = Vec::new();
+
+        for gy in y0..=y1 {
+            for gx in x0..=x1 {
+                let bucket_idx = gy * self.grid_w + gx;
+                if let Some(bucket) = self.buckets.get(bucket_idx) {
+                    for &word_idx in bucket {
+                        // If the caller violates the contract and supplies a different `words`
+                        // slice than the one used at build time, buckets can contain indices that
+                        // are out of range. Skip rather than panic.
+                        if word_idx >= words.len() || word_idx >= self.seen.len() {
+                            continue;
+                        }
+
+                        if self.seen[word_idx] == gen {
+                            continue;
+                        }
+                        self.seen[word_idx] = gen;
+
+                        if words.get(word_idx).is_some_and(|w| w.bbox.intersects(rect)) {
+                            out.push(word_idx);
+                        }
+                    }
+                }
+            }
+        }
+
+        out
+    }
+
+    fn next_seen_gen(&mut self) -> u32 {
+        if self.seen_gen == u32::MAX {
+            self.seen.fill(0);
+            self.seen_gen = 1;
+        } else {
+            self.seen_gen += 1;
+        }
+        self.seen_gen
+    }
+
+    fn insert_bbox(
+        buckets: &mut [Vec<usize>],
+        grid_w: usize,
+        grid_h: usize,
+        cell_size: f64,
+        word_idx: usize,
+        bbox: Rect,
+    ) {
+        if grid_w == 0 || grid_h == 0 || cell_size <= 0.0 {
+            return;
+        }
+
+        if bbox.w <= 0.0 || bbox.h <= 0.0 {
+            return;
+        }
+
+        let x0 = (bbox.x / cell_size).floor() as isize;
+        let y0 = (bbox.y / cell_size).floor() as isize;
+        let x1 = ((bbox.x + bbox.w) / cell_size).floor() as isize;
+        let y1 = ((bbox.y + bbox.h) / cell_size).floor() as isize;
+
+        let x0 = x0.clamp(0, (grid_w - 1) as isize) as usize;
+        let y0 = y0.clamp(0, (grid_h - 1) as isize) as usize;
+        let x1 = x1.clamp(0, (grid_w - 1) as isize) as usize;
+        let y1 = y1.clamp(0, (grid_h - 1) as isize) as usize;
+
+        for gy in y0..=y1 {
+            for gx in x0..=x1 {
+                let bucket_idx = gy * grid_w + gx;
+                if let Some(bucket) = buckets.get_mut(bucket_idx) {
+                    bucket.push(word_idx);
+                }
+            }
+        }
+    }
+
+    fn cell_range(
+        cell_size: f64,
+        grid_w: usize,
+        grid_h: usize,
+        rect: &Rect,
+    ) -> Option<(usize, usize, usize, usize)> {
+        if cell_size <= 0.0 || grid_w == 0 || grid_h == 0 {
+            return None;
+        }
+
+        // Keep semantics aligned with `Rect::intersects()`: degenerate rectangles (w==0 or h==0)
+        // can still "hit" boxes like a line/point selection.
+        if rect.w < 0.0 || rect.h < 0.0 {
+            return None;
+        }
+
+        let x0 = (rect.x / cell_size).floor() as isize;
+        let y0 = (rect.y / cell_size).floor() as isize;
+        let x1 = ((rect.x + rect.w) / cell_size).floor() as isize;
+        let y1 = ((rect.y + rect.h) / cell_size).floor() as isize;
+
+        let x0 = x0.clamp(0, (grid_w - 1) as isize) as usize;
+        let y0 = y0.clamp(0, (grid_h - 1) as isize) as usize;
+        let x1 = x1.clamp(0, (grid_w - 1) as isize) as usize;
+        let y1 = y1.clamp(0, (grid_h - 1) as isize) as usize;
+
+        Some((x0, y0, x1, y1))
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::OcrWordIndex;
+    use crate::geometry::Rect;
+
+    use super::super::models::OcrWord;
+
+    fn w(text: &str, bbox: Rect, order: usize) -> OcrWord {
+        OcrWord {
+            text: text.to_string(),
+            confidence: 99.0,
+            bbox,
+            order,
+        }
+    }
+
+    #[test]
+    fn query_returns_intersecting_words_only() {
+        let words = vec![
+            w(
+                "a",
+                Rect {
+                    x: 10.0,
+                    y: 10.0,
+                    w: 10.0,
+                    h: 10.0,
+                },
+                0,
+            ),
+            w(
+                "b",
+                Rect {
+                    x: 300.0,
+                    y: 10.0,
+                    w: 10.0,
+                    h: 10.0,
+                },
+                1,
+            ),
+            w(
+                "c",
+                Rect {
+                    x: 10.0,
+                    y: 300.0,
+                    w: 10.0,
+                    h: 10.0,
+                },
+                2,
+            ),
+        ];
+
+        let mut idx = OcrWordIndex::build_with_cell_size(&words, 1000.0, 1000.0, 64.0);
+        let r = Rect {
+            x: 290.0,
+            y: 0.0,
+            w: 50.0,
+            h: 50.0,
+        };
+        let mut out = idx.query_intersecting(&words, &r);
+        out.sort_unstable();
+
+        assert_eq!(out, vec![1]);
+    }
+
+    #[test]
+    fn query_deduplicates_words_that_span_multiple_cells() {
+        let words = vec![w(
+            "x",
+            Rect {
+                x: 60.0,
+                y: 60.0,
+                w: 10.0,
+                h: 10.0,
+            },
+            0,
+        )];
+
+        // With cell_size=64, this bbox overlaps both cell (0,0) and (1,1).
+        let mut idx = OcrWordIndex::build_with_cell_size(&words, 256.0, 256.0, 64.0);
+        let r = Rect {
+            x: 0.0,
+            y: 0.0,
+            w: 200.0,
+            h: 200.0,
+        };
+        let out = idx.query_intersecting(&words, &r);
+
+        assert_eq!(out, vec![0]);
+    }
+
+    #[test]
+    fn degenerate_rects_still_hit_via_intersects_semantics() {
+        let words = vec![w(
+            "a",
+            Rect {
+                x: 10.0,
+                y: 10.0,
+                w: 10.0,
+                h: 10.0,
+            },
+            0,
+        )];
+        let mut idx = OcrWordIndex::build_with_cell_size(&words, 100.0, 100.0, 32.0);
+
+        // Point hit inside the word bbox.
+        let p = Rect {
+            x: 15.0,
+            y: 15.0,
+            w: 0.0,
+            h: 0.0,
+        };
+        assert_eq!(idx.query_intersecting(&words, &p), vec![0]);
+    }
+}
diff --git a/crates/quickview-core/src/ocr/mod.rs b/crates/quickview-core/src/ocr/mod.rs
index 870e262..2552ffd 100644
--- a/crates/quickview-core/src/ocr/mod.rs
+++ b/crates/quickview-core/src/ocr/mod.rs
@@ -1,5 +1,6 @@
 //! OCR-related types and helpers.
 
+pub mod index;
 pub mod models;
 pub mod select;
 pub mod tesseract;
diff --git a/crates/quickview-ui/Cargo.toml b/crates/quickview-ui/Cargo.toml
index db2dcfe..c350b37 100644
--- a/crates/quickview-ui/Cargo.toml
+++ b/crates/quickview-ui/Cargo.toml
@@ -16,7 +16,7 @@ tracing.workspace = true
 
 quickview-core = { path = "../quickview-core" }
 
-gtk4 = { version = "0.10", package = "gtk4" }
+gtk4 = { version = "0.10", package = "gtk4", features = ["v4_10"] }
 adw = { version = "0.8", package = "libadwaita", features = ["v1_4"] }
 
 gtk4-layer-shell = "0.7"
@@ -25,4 +25,3 @@ async-channel = "2"
 
 # Optional: sandboxed image decoding (requires system glycin libs)
 glycin = { version = "3", optional = true }
-
diff --git a/crates/quickview-ui/src/widgets/image_overlay.rs b/crates/quickview-ui/src/widgets/image_overlay.rs
index 92dfdee..068c397 100644
--- a/crates/quickview-ui/src/widgets/image_overlay.rs
+++ b/crates/quickview-ui/src/widgets/image_overlay.rs
@@ -1,41 +1,26 @@
-use std::{cell::RefCell, rc::Rc};
+use std::cell::RefCell;
 
+use glib::subclass::types::ObjectSubclassIsExt;
 use gtk::prelude::*;
+use gtk::subclass::prelude::*;
 use gtk4 as gtk;
 
 use quickview_core::{
-    geometry::{Point, Rect},
-    ocr::{models::OcrResult, select},
+    geometry::{Point, Rect, ViewTransform},
+    ocr::{index::OcrWordIndex, models::OcrResult, select},
 };
 
-#[derive(Default)]
-struct State {
-    image_width: f64,
-    image_height: f64,
-    ocr: Option<OcrResult>,
+const MIN_ZOOM_FACTOR: f64 = 1.0;
+const BASE_MAX_ZOOM_FACTOR: f64 = 20.0;
+const ZOOM_STEP: f64 = 1.25;
+const INTEGER_SCALE_EPS: f64 = 0.02;
+const PAN_DIM_EPS: f64 = 0.5;
 
-    // Current selection in widget coordinates.
-    selecting: bool,
-    select_start: Point,
-    select_current: Point,
-
-    // Cached selected word indices (into ocr.words)
-    selected: Vec<usize>,
-}
-
-/// Overlay widget that displays an image and (optionally) an OCR-backed selection layer.
-///
-/// This is an MVP scaffold:
-/// - selection is rectangle drag
-/// - selected words are those whose bounding boxes intersect the rectangle
 #[derive(Clone)]
 pub struct ImageOverlayWidget {
     root: gtk::Overlay,
-    picture: gtk::Picture,
-    drawing: gtk::DrawingArea,
+    canvas: ZoomableCanvas,
     spinner: gtk::Spinner,
-
-    state: Rc<RefCell<State>>,
 }
 
 impl Default for ImageOverlayWidget {
@@ -48,16 +33,10 @@ impl ImageOverlayWidget {
     pub fn new() -> Self {
         let root = gtk::Overlay::new();
 
-        let picture = gtk::Picture::new();
-        picture.set_can_shrink(true);
-
-        root.set_child(Some(&picture));
-
-        let drawing = gtk::DrawingArea::new();
-        drawing.set_hexpand(true);
-        drawing.set_vexpand(true);
-        drawing.set_focusable(true);
-        root.add_overlay(&drawing);
+        let canvas = ZoomableCanvas::new();
+        canvas.set_hexpand(true);
+        canvas.set_vexpand(true);
+        root.set_child(Some(&canvas));
 
         let spinner = gtk::Spinner::new();
         spinner.set_spinning(false);
@@ -66,194 +45,928 @@ impl ImageOverlayWidget {
         spinner.set_valign(gtk::Align::Center);
         root.add_overlay(&spinner);
 
-        let state = Rc::new(RefCell::new(State::default()));
+        Self {
+            root,
+            canvas,
+            spinner,
+        }
+    }
 
-        // Draw highlights + selection rectangle.
-        {
-            let state = state.clone();
-            drawing.set_draw_func(move |_, cr, width, height| {
-                let s = state.borrow();
+    pub fn widget(&self) -> gtk::Widget {
+        self.root.clone().upcast()
+    }
 
-                if s.image_width <= 0.0 || s.image_height <= 0.0 {
-                    return;
-                }
+    pub fn set_texture(&self, texture: gtk::gdk::Texture) {
+        self.canvas.set_texture(texture);
+    }
 
-                let (scale, ox, oy) = compute_contain_transform(
-                    width as f64,
-                    height as f64,
-                    s.image_width,
-                    s.image_height,
-                );
+    pub fn set_ocr_result(&self, result: Option<OcrResult>) {
+        self.canvas.set_ocr_result(result);
+    }
 
-                // Draw selection rectangle (widget coords)
-                if s.selecting {
-                    let sel = Rect::from_points(s.select_start, s.select_current);
-                    cr.set_source_rgba(0.2, 0.6, 1.0, 0.25);
-                    cr.rectangle(sel.x, sel.y, sel.w, sel.h);
-                    let _ = cr.fill();
-                    cr.set_source_rgba(0.2, 0.6, 1.0, 0.8);
-                    cr.set_line_width(1.0);
-                    cr.rectangle(sel.x, sel.y, sel.w, sel.h);
-                    let _ = cr.stroke();
-                }
+    pub fn set_ocr_busy(&self, busy: bool) {
+        self.spinner.set_spinning(busy);
+        self.spinner.set_visible(busy);
+    }
 
-                // Draw selected word boxes
-                if let Some(ocr) = &s.ocr {
-                    cr.set_source_rgba(1.0, 1.0, 0.0, 0.25);
-                    for &idx in &s.selected {
-                        if let Some(w) = ocr.words.get(idx) {
-                            let rx = ox + w.bbox.x * scale;
-                            let ry = oy + w.bbox.y * scale;
-                            let rw = w.bbox.w * scale;
-                            let rh = w.bbox.h * scale;
-                            cr.rectangle(rx, ry, rw, rh);
-                            let _ = cr.fill();
-                        }
+    pub fn clear_selection(&self) {
+        self.canvas.clear_selection();
+    }
+
+    pub fn selected_text(&self) -> String {
+        self.canvas.selected_text()
+    }
+
+    pub fn zoom_by(&self, factor: f64) {
+        self.canvas.zoom_by(factor);
+    }
+
+    pub fn zoom_in(&self) {
+        self.canvas.zoom_by(ZOOM_STEP);
+    }
+
+    pub fn zoom_out(&self) {
+        self.canvas.zoom_by(1.0 / ZOOM_STEP);
+    }
+
+    pub fn reset_view(&self) {
+        self.canvas.reset_view();
+    }
+}
+
+mod imp {
+    use super::*;
+
+    #[derive(Default)]
+    pub struct ZoomableCanvas {
+        pub(super) state: RefCell<CanvasState>,
+    }
+
+    #[glib::object_subclass]
+    impl ObjectSubclass for ZoomableCanvas {
+        const NAME: &'static str = "QuickViewZoomableCanvas";
+        type Type = super::ZoomableCanvas;
+        type ParentType = gtk::Widget;
+    }
+
+    impl ObjectImpl for ZoomableCanvas {
+        fn constructed(&self) {
+            self.parent_constructed();
+
+            let obj = self.obj();
+            obj.set_focusable(true);
+            obj.setup_controllers();
+        }
+    }
+
+    impl WidgetImpl for ZoomableCanvas {
+        fn measure(&self, orientation: gtk::Orientation, for_size: i32) -> (i32, i32, i32, i32) {
+            let state = self.state.borrow();
+            let natural = natural_size_for_measure(
+                orientation,
+                for_size,
+                state.image_width,
+                state.image_height,
+            );
+            (1, natural, -1, -1)
+        }
+
+        fn snapshot(&self, snapshot: &gtk::Snapshot) {
+            self.parent_snapshot(snapshot);
+
+            let widget = self.obj();
+            let widget_w = widget.width() as f64;
+            let widget_h = widget.height() as f64;
+
+            let mut state = self.state.borrow_mut();
+            let texture = state.texture.clone();
+            let Some(transform) = transform_for_widget(&mut state, widget_w, widget_h) else {
+                return;
+            };
+
+            let Some(texture) = texture.as_ref() else {
+                return;
+            };
+
+            let bounds = gtk::graphene::Rect::new(
+                transform.offset_x() as f32,
+                transform.offset_y() as f32,
+                (state.image_width * transform.scale()) as f32,
+                (state.image_height * transform.scale()) as f32,
+            );
+            snapshot.append_scaled_texture(
+                texture,
+                scaling_filter_for_scale(transform.scale()),
+                &bounds,
+            );
+
+            let overlay_bounds =
+                gtk::graphene::Rect::new(0.0, 0.0, widget_w as f32, widget_h as f32);
+            let cr = snapshot.append_cairo(&overlay_bounds);
+
+            if state.selecting {
+                let sel = Rect::from_points(state.select_start_widget, state.select_current_widget);
+                cr.set_source_rgba(0.2, 0.6, 1.0, 0.25);
+                cr.rectangle(sel.x, sel.y, sel.w, sel.h);
+                let _ = cr.fill();
+
+                cr.set_source_rgba(0.2, 0.6, 1.0, 0.8);
+                cr.set_line_width(1.0);
+                cr.rectangle(sel.x, sel.y, sel.w, sel.h);
+                let _ = cr.stroke();
+            }
+
+            if let Some(ocr) = &state.ocr {
+                cr.set_source_rgba(1.0, 1.0, 0.0, 0.25);
+                for &idx in &state.selected_indices {
+                    if let Some(word) = ocr.words.get(idx) {
+                        let rect = transform.image_rect_to_widget(word.bbox);
+                        cr.rectangle(rect.x, rect.y, rect.w, rect.h);
+                        let _ = cr.fill();
                     }
                 }
+            }
+        }
+    }
+
+    #[derive(Clone)]
+    pub(super) struct CanvasState {
+        pub(super) texture: Option<gtk::gdk::Texture>,
+        pub(super) image_width: f64,
+        pub(super) image_height: f64,
+        pub(super) ocr: Option<OcrResult>,
+        pub(super) ocr_index: Option<OcrWordIndex>,
+        pub(super) selected_indices: Vec<usize>,
+
+        pub(super) zoom_factor: f64,
+        pub(super) center_img: Point,
+
+        pub(super) selecting: bool,
+        pub(super) select_start_widget: Point,
+        pub(super) select_current_widget: Point,
+
+        pub(super) panning: bool,
+        pub(super) pan_start_widget: Point,
+        pub(super) pan_start_center_img: Point,
+        pub(super) last_cursor_widget: Option<Point>,
+
+        pub(super) pinch_active: bool,
+        pub(super) pinch_start_zoom_factor: f64,
+        pub(super) pinch_start_center_img: Point,
+        pub(super) pinch_anchor_widget: Point,
+    }
+
+    impl Default for CanvasState {
+        fn default() -> Self {
+            Self {
+                texture: None,
+                image_width: 0.0,
+                image_height: 0.0,
+                ocr: None,
+                ocr_index: None,
+                selected_indices: Vec::new(),
+                zoom_factor: 1.0,
+                center_img: Point::default(),
+                selecting: false,
+                select_start_widget: Point::default(),
+                select_current_widget: Point::default(),
+                panning: false,
+                pan_start_widget: Point::default(),
+                pan_start_center_img: Point::default(),
+                last_cursor_widget: None,
+                pinch_active: false,
+                pinch_start_zoom_factor: 1.0,
+                pinch_start_center_img: Point::default(),
+                pinch_anchor_widget: Point::default(),
+            }
+        }
+    }
+}
+
+glib::wrapper! {
+    pub struct ZoomableCanvas(ObjectSubclass<imp::ZoomableCanvas>)
+        @extends gtk::Widget,
+        @implements gtk::Accessible, gtk::Buildable, gtk::ConstraintTarget;
+}
+
+impl Default for ZoomableCanvas {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl ZoomableCanvas {
+    pub fn new() -> Self {
+        glib::Object::new()
+    }
+
+    pub fn set_texture(&self, texture: gtk::gdk::Texture) {
+        let mut state = self.imp().state.borrow_mut();
+        state.texture = Some(texture.clone());
+        state.image_width = texture.width() as f64;
+        state.image_height = texture.height() as f64;
+        state.ocr = None;
+        state.ocr_index = None;
+        state.zoom_factor = MIN_ZOOM_FACTOR;
+        state.center_img = Point {
+            x: state.image_width * 0.5,
+            y: state.image_height * 0.5,
+        };
+        state.selecting = false;
+        state.panning = false;
+        state.pinch_active = false;
+        state.selected_indices.clear();
+        drop(state);
+        self.queue_draw();
+        self.update_cursor();
+    }
+
+    pub fn set_ocr_result(&self, result: Option<OcrResult>) {
+        let mut state = self.imp().state.borrow_mut();
+        state.ocr = result;
+        state.ocr_index = state
+            .ocr
+            .as_ref()
+            .map(|ocr| OcrWordIndex::build(&ocr.words, state.image_width, state.image_height));
+        state.selected_indices.clear();
+        state.selecting = false;
+        drop(state);
+        self.queue_draw();
+    }
+
+    pub fn clear_selection(&self) {
+        let mut state = self.imp().state.borrow_mut();
+        state.selecting = false;
+        state.selected_indices.clear();
+        drop(state);
+        self.queue_draw();
+    }
+
+    pub fn selected_text(&self) -> String {
+        let state = self.imp().state.borrow();
+        let Some(ocr) = &state.ocr else {
+            return String::new();
+        };
+
+        let words = state
+            .selected_indices
+            .iter()
+            .filter_map(|&idx| ocr.words.get(idx))
+            .collect::<Vec<_>>();
+        select::selected_text(words)
+    }
+
+    pub fn zoom_by(&self, factor: f64) {
+        if factor <= 0.0 {
+            return;
+        }
+        self.zoom_at(self.widget_center(), factor);
+    }
+
+    pub fn reset_view(&self) {
+        let mut state = self.imp().state.borrow_mut();
+        if state.image_width <= 0.0 || state.image_height <= 0.0 {
+            return;
+        }
+        state.zoom_factor = MIN_ZOOM_FACTOR;
+        state.center_img = Point {
+            x: state.image_width * 0.5,
+            y: state.image_height * 0.5,
+        };
+        state.selecting = false;
+        state.panning = false;
+        state.pinch_active = false;
+        drop(state);
+        self.queue_draw();
+        self.update_cursor();
+    }
+
+    fn setup_controllers(&self) {
+        let motion = gtk::EventControllerMotion::new();
+        {
+            let canvas = self.clone();
+            motion.connect_motion(move |_, x, y| {
+                let mut state = canvas.imp().state.borrow_mut();
+                state.last_cursor_widget = Some(Point { x, y });
+                drop(state);
+                canvas.update_cursor();
+            });
+        }
+        {
+            let canvas = self.clone();
+            motion.connect_leave(move |_| {
+                let mut state = canvas.imp().state.borrow_mut();
+                state.last_cursor_widget = None;
+                drop(state);
+                canvas.update_cursor();
             });
         }
+        self.add_controller(motion);
 
-        // Drag-selection gesture
+        let scroll = gtk::EventControllerScroll::new(gtk::EventControllerScrollFlags::VERTICAL);
+        {
+            let canvas = self.clone();
+            scroll.connect_scroll(move |controller, _dx, dy| {
+                let mods = controller.current_event_state();
+                if !mods.contains(gtk::gdk::ModifierType::CONTROL_MASK) {
+                    return glib::Propagation::Proceed;
+                }
+
+                let factor = if dy < 0.0 {
+                    ZOOM_STEP
+                } else if dy > 0.0 {
+                    1.0 / ZOOM_STEP
+                } else {
+                    return glib::Propagation::Stop;
+                };
+
+                let anchor = canvas.last_cursor_widget();
+                canvas.zoom_at(anchor, factor);
+                glib::Propagation::Stop
+            });
+        }
+        self.add_controller(scroll);
+
+        let pinch = gtk::GestureZoom::new();
+        {
+            let canvas = self.clone();
+            pinch.connect_begin(move |gesture, _| {
+                canvas.on_pinch_begin(gesture);
+            });
+        }
+        {
+            let canvas = self.clone();
+            pinch.connect_scale_changed(move |gesture, scale_factor| {
+                canvas.on_pinch_scale_changed(gesture, scale_factor);
+            });
+        }
         {
-            let drag = gtk::GestureDrag::new();
-
-            let state_begin = state.clone();
-            let drawing_begin = drawing.clone();
-            drag.connect_drag_begin(move |_, x, y| {
-                let mut s = state_begin.borrow_mut();
-                s.selecting = true;
-                s.select_start = Point { x, y };
-                s.select_current = Point { x, y };
-                s.selected.clear();
-                drawing_begin.queue_draw();
+            let canvas = self.clone();
+            pinch.connect_end(move |_, _| {
+                canvas.on_pinch_end();
             });
+        }
+        {
+            let canvas = self.clone();
+            pinch.connect_cancel(move |_, _| {
+                canvas.on_pinch_end();
+            });
+        }
+        self.add_controller(pinch);
 
-            let state_update = state.clone();
-            let drawing_update = drawing.clone();
+        let drag = gtk::GestureDrag::new();
+        drag.set_button(0);
+        {
+            let canvas = self.clone();
+            drag.connect_drag_begin(move |gesture, x, y| {
+                canvas.on_drag_begin(gesture, x, y);
+            });
+        }
+        {
+            let canvas = self.clone();
             drag.connect_drag_update(move |_, dx, dy| {
-                let mut s = state_update.borrow_mut();
-                let cur = Point {
-                    x: s.select_start.x + dx,
-                    y: s.select_start.y + dy,
+                canvas.on_drag_update(dx, dy);
+            });
+        }
+        {
+            let canvas = self.clone();
+            drag.connect_drag_end(move |_, _, _| {
+                canvas.on_drag_end();
+            });
+        }
+        self.add_controller(drag);
+    }
+
+    fn on_drag_begin(&self, gesture: &gtk::GestureDrag, x: f64, y: f64) {
+        let mut state = self.imp().state.borrow_mut();
+        let widget_w = self.width() as f64;
+        let widget_h = self.height() as f64;
+
+        let cursor = Point { x, y };
+        state.last_cursor_widget = Some(cursor);
+
+        let button = gesture.current_button();
+        let mods = gesture.current_event_state();
+        let ctrl_pressed = mods.contains(gtk::gdk::ModifierType::CONTROL_MASK);
+        let pan_requested = button == gtk::gdk::BUTTON_MIDDLE
+            || (button == gtk::gdk::BUTTON_PRIMARY && ctrl_pressed);
+        let can_pan = can_pan_at_view(
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+            state.zoom_factor,
+        );
+
+        if pan_requested && can_pan {
+            state.panning = true;
+            state.selecting = false;
+            state.pan_start_widget = cursor;
+            state.pan_start_center_img = state.center_img;
+        } else if button == gtk::gdk::BUTTON_PRIMARY {
+            state.selecting = true;
+            state.panning = false;
+            state.select_start_widget = cursor;
+            state.select_current_widget = cursor;
+            state.selected_indices.clear();
+        } else {
+            state.selecting = false;
+            state.panning = false;
+        }
+
+        drop(state);
+        self.update_cursor();
+        self.queue_draw();
+    }
+
+    fn on_drag_update(&self, dx: f64, dy: f64) {
+        let mut needs_redraw = false;
+        let mut state = self.imp().state.borrow_mut();
+        let widget_w = self.width() as f64;
+        let widget_h = self.height() as f64;
+
+        if state.panning {
+            let current = Point {
+                x: state.pan_start_widget.x + dx,
+                y: state.pan_start_widget.y + dy,
+            };
+            state.last_cursor_widget = Some(current);
+
+            if let Some(transform) = transform_for_widget(&mut state, widget_w, widget_h) {
+                state.center_img = Point {
+                    x: state.pan_start_center_img.x - (dx / transform.scale()),
+                    y: state.pan_start_center_img.y - (dy / transform.scale()),
                 };
-                s.select_current = cur;
+                state.center_img = ViewTransform::clamp_center(
+                    widget_w,
+                    widget_h,
+                    state.image_width,
+                    state.image_height,
+                    transform.scale(),
+                    state.center_img,
+                );
+                needs_redraw = true;
+            }
+        }
 
-                // Update selection set.
-                if let Some(ocr) = &s.ocr {
-                    let width = drawing_update.width() as f64;
-                    let height = drawing_update.height() as f64;
+        if state.selecting {
+            let current = Point {
+                x: state.select_start_widget.x + dx,
+                y: state.select_start_widget.y + dy,
+            };
+            state.select_current_widget = current;
+            state.last_cursor_widget = Some(current);
+
+            if let Some(transform) = transform_for_widget(&mut state, widget_w, widget_h) {
+                let sel_widget =
+                    Rect::from_points(state.select_start_widget, state.select_current_widget);
+                let sel_image = transform.widget_rect_to_image(sel_widget);
+
+                let selected = {
+                    let s: &mut imp::CanvasState = &mut state;
+                    match (&s.ocr, &mut s.ocr_index) {
+                        (Some(ocr), Some(index)) => {
+                            index.query_intersecting(&ocr.words, &sel_image)
+                        }
+                        (Some(ocr), None) => ocr
+                            .words
+                            .iter()
+                            .enumerate()
+                            .filter_map(|(idx, word)| {
+                                word.bbox.intersects(&sel_image).then_some(idx)
+                            })
+                            .collect::<Vec<_>>(),
+                        _ => Vec::new(),
+                    }
+                };
+                state.selected_indices = selected;
+                needs_redraw = true;
+            }
+        }
 
-                    let (scale, ox, oy) =
-                        compute_contain_transform(width, height, s.image_width, s.image_height);
+        drop(state);
+        if needs_redraw {
+            self.queue_draw();
+        }
+    }
 
-                    let sel_widget = Rect::from_points(s.select_start, s.select_current);
-                    let sel_image = widget_rect_to_image_rect(sel_widget, scale, ox, oy);
+    fn on_drag_end(&self) {
+        let mut state = self.imp().state.borrow_mut();
+        state.selecting = false;
+        state.panning = false;
+        drop(state);
+        self.update_cursor();
+        self.queue_draw();
+    }
 
-                    let selected = select::select_words(&ocr.words, sel_image)
-                        .into_iter()
-                        .filter_map(|w| {
-                            // Convert reference to index.
-                            // This is O(n) but fine for scaffold.
-                            ocr.words.iter().position(|x| std::ptr::eq(x, w))
-                        })
-                        .collect::<Vec<_>>();
+    fn on_pinch_begin(&self, gesture: &gtk::GestureZoom) {
+        let mut state = self.imp().state.borrow_mut();
+        if state.image_width <= 0.0 || state.image_height <= 0.0 {
+            return;
+        }
 
-                    s.selected = selected;
-                }
+        let default_anchor = self.widget_center();
+        let anchor = gesture
+            .bounding_box_center()
+            .map(|(x, y)| Point { x, y })
+            .unwrap_or(default_anchor);
 
-                drawing_update.queue_draw();
-            });
+        state.pinch_active = true;
+        state.pinch_start_zoom_factor = state.zoom_factor;
+        state.pinch_start_center_img = state.center_img;
+        state.pinch_anchor_widget = anchor;
+    }
 
-            let state_end = state.clone();
-            let drawing_end = drawing.clone();
-            drag.connect_drag_end(move |_, _, _| {
-                let mut s = state_end.borrow_mut();
-                s.selecting = false;
-                drawing_end.queue_draw();
-            });
+    fn on_pinch_scale_changed(&self, gesture: &gtk::GestureZoom, scale_factor: f64) {
+        let mut state = self.imp().state.borrow_mut();
+        if !state.pinch_active || state.image_width <= 0.0 || state.image_height <= 0.0 {
+            return;
+        }
 
-            drawing.add_controller(drag);
+        let widget_w = self.width() as f64;
+        let widget_h = self.height() as f64;
+        if widget_w <= 0.0 || widget_h <= 0.0 {
+            return;
         }
 
-        Self {
-            root,
-            picture,
-            drawing,
-            spinner,
-            state,
+        let begin_transform = {
+            let begin_scale = ViewTransform::from_center(
+                widget_w,
+                widget_h,
+                state.image_width,
+                state.image_height,
+                state.pinch_start_zoom_factor,
+                state.pinch_start_center_img,
+            )
+            .scale();
+            let begin_center = ViewTransform::clamp_center(
+                widget_w,
+                widget_h,
+                state.image_width,
+                state.image_height,
+                begin_scale,
+                state.pinch_start_center_img,
+            );
+            ViewTransform::from_center(
+                widget_w,
+                widget_h,
+                state.image_width,
+                state.image_height,
+                state.pinch_start_zoom_factor,
+                begin_center,
+            )
+        };
+
+        let fallback_anchor = if state.pinch_anchor_widget == Point::default() {
+            self.widget_center()
+        } else {
+            state.pinch_anchor_widget
+        };
+        let anchor_widget = gesture
+            .bounding_box_center()
+            .map(|(x, y)| Point { x, y })
+            .unwrap_or(fallback_anchor);
+        state.pinch_anchor_widget = anchor_widget;
+        let anchor_img = begin_transform.widget_to_image(anchor_widget);
+        let gesture_scale = if scale_factor > 0.0 {
+            scale_factor
+        } else {
+            gesture.scale_delta().max(f64::MIN_POSITIVE)
+        };
+
+        state.zoom_factor = clamp_zoom_factor(
+            state.pinch_start_zoom_factor * gesture_scale,
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+        );
+
+        let new_scale = ViewTransform::from_center(
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+            state.zoom_factor,
+            state.center_img,
+        )
+        .scale();
+        let widget_center =
+            ViewTransform::contain(widget_w, widget_h, state.image_width, state.image_height)
+                .widget_center;
+        state.center_img = recenter_for_anchor(widget_center, new_scale, anchor_widget, anchor_img);
+        state.center_img = ViewTransform::clamp_center(
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+            new_scale,
+            state.center_img,
+        );
+
+        drop(state);
+        self.queue_draw();
+        self.update_cursor();
+    }
+
+    fn on_pinch_end(&self) {
+        let mut state = self.imp().state.borrow_mut();
+        state.pinch_active = false;
+        drop(state);
+        self.update_cursor();
+    }
+
+    fn zoom_at(&self, anchor_widget: Point, factor: f64) {
+        if factor <= 0.0 {
+            return;
+        }
+
+        let mut state = self.imp().state.borrow_mut();
+        if state.image_width <= 0.0 || state.image_height <= 0.0 {
+            return;
+        }
+
+        let widget_w = self.width() as f64;
+        let widget_h = self.height() as f64;
+        let Some(current_transform) = transform_for_widget(&mut state, widget_w, widget_h) else {
+            return;
+        };
+
+        let anchor_img = current_transform.widget_to_image(anchor_widget);
+        let new_zoom = clamp_zoom_factor(
+            state.zoom_factor * factor,
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+        );
+        if (new_zoom - state.zoom_factor).abs() <= f64::EPSILON {
+            return;
         }
+        state.zoom_factor = new_zoom;
+
+        let new_scale = ViewTransform::from_center(
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+            state.zoom_factor,
+            state.center_img,
+        )
+        .scale();
+        let widget_center =
+            ViewTransform::contain(widget_w, widget_h, state.image_width, state.image_height)
+                .widget_center;
+        state.center_img = recenter_for_anchor(widget_center, new_scale, anchor_widget, anchor_img);
+        state.center_img = ViewTransform::clamp_center(
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+            new_scale,
+            state.center_img,
+        );
+
+        drop(state);
+        self.queue_draw();
+        self.update_cursor();
     }
 
-    pub fn widget(&self) -> gtk::Widget {
-        self.root.clone().upcast()
+    fn widget_center(&self) -> Point {
+        Point {
+            x: (self.width() as f64) * 0.5,
+            y: (self.height() as f64) * 0.5,
+        }
     }
 
-    pub fn set_texture(&self, texture: gtk::gdk::Texture) {
-        let mut s = self.state.borrow_mut();
-        s.image_width = texture.width() as f64;
-        s.image_height = texture.height() as f64;
-        drop(s);
-        self.picture.set_paintable(Some(&texture));
-        self.drawing.queue_draw();
+    fn last_cursor_widget(&self) -> Point {
+        self.imp()
+            .state
+            .borrow()
+            .last_cursor_widget
+            .unwrap_or_else(|| self.widget_center())
     }
 
-    pub fn set_ocr_result(&self, result: Option<OcrResult>) {
-        let mut s = self.state.borrow_mut();
-        s.ocr = result;
-        s.selected.clear();
-        s.selecting = false;
-        self.drawing.queue_draw();
+    fn update_cursor(&self) {
+        let state = self.imp().state.borrow();
+        let can_pan = can_pan_at_view(
+            self.width() as f64,
+            self.height() as f64,
+            state.image_width,
+            state.image_height,
+            state.zoom_factor,
+        );
+        if state.panning && can_pan {
+            self.set_cursor_from_name(Some("grabbing"));
+        } else if can_pan {
+            self.set_cursor_from_name(Some("grab"));
+        } else {
+            self.set_cursor_from_name(None);
+        }
     }
+}
 
-    pub fn set_ocr_busy(&self, busy: bool) {
-        self.spinner.set_spinning(busy);
-        self.spinner.set_visible(busy);
+fn transform_for_widget(
+    state: &mut imp::CanvasState,
+    widget_w: f64,
+    widget_h: f64,
+) -> Option<ViewTransform> {
+    if widget_w <= 0.0 || widget_h <= 0.0 || state.image_width <= 0.0 || state.image_height <= 0.0 {
+        return None;
     }
 
-    pub fn clear_selection(&self) {
-        let mut s = self.state.borrow_mut();
-        s.selected.clear();
-        s.selecting = false;
-        self.drawing.queue_draw();
+    let max_zoom =
+        max_zoom_factor_for_dims(widget_w, widget_h, state.image_width, state.image_height);
+    if state.zoom_factor > max_zoom {
+        state.zoom_factor = max_zoom;
+    } else if state.zoom_factor < MIN_ZOOM_FACTOR {
+        state.zoom_factor = MIN_ZOOM_FACTOR;
     }
 
-    pub fn selected_text(&self) -> String {
-        let s = self.state.borrow();
-        let Some(ocr) = &s.ocr else {
-            return String::new();
-        };
+    let mut transform = ViewTransform::from_center(
+        widget_w,
+        widget_h,
+        state.image_width,
+        state.image_height,
+        state.zoom_factor,
+        state.center_img,
+    );
+    let clamped = ViewTransform::clamp_center(
+        widget_w,
+        widget_h,
+        state.image_width,
+        state.image_height,
+        transform.scale(),
+        state.center_img,
+    );
+    if point_changed(clamped, state.center_img) {
+        state.center_img = clamped;
+        transform = ViewTransform::from_center(
+            widget_w,
+            widget_h,
+            state.image_width,
+            state.image_height,
+            state.zoom_factor,
+            state.center_img,
+        );
+    }
 
-        let words = s
-            .selected
-            .iter()
-            .filter_map(|&idx| ocr.words.get(idx))
-            .collect::<Vec<_>>();
+    Some(transform)
+}
 
-        select::selected_text(words)
+fn point_changed(a: Point, b: Point) -> bool {
+    (a.x - b.x).abs() > f64::EPSILON || (a.y - b.y).abs() > f64::EPSILON
+}
+
+fn natural_size_for_measure(
+    orientation: gtk::Orientation,
+    for_size: i32,
+    image_w: f64,
+    image_h: f64,
+) -> i32 {
+    if image_w <= 0.0 || image_h <= 0.0 {
+        return 1;
+    }
+
+    if for_size > 0 {
+        match orientation {
+            gtk::Orientation::Horizontal => size_to_i32((for_size as f64) * (image_w / image_h)),
+            gtk::Orientation::Vertical => size_to_i32((for_size as f64) * (image_h / image_w)),
+            _ => 1,
+        }
+    } else {
+        match orientation {
+            gtk::Orientation::Horizontal => size_to_i32(image_w),
+            gtk::Orientation::Vertical => size_to_i32(image_h),
+            _ => 1,
+        }
+    }
+}
+
+fn size_to_i32(value: f64) -> i32 {
+    value.round().clamp(1.0, i32::MAX as f64) as i32
+}
+
+fn clamp_zoom_factor(zoom: f64, widget_w: f64, widget_h: f64, image_w: f64, image_h: f64) -> f64 {
+    let max_zoom = max_zoom_factor_for_dims(widget_w, widget_h, image_w, image_h);
+    zoom.clamp(MIN_ZOOM_FACTOR, max_zoom)
+}
+
+fn contain_scale_for_dims(widget_w: f64, widget_h: f64, image_w: f64, image_h: f64) -> Option<f64> {
+    if widget_w <= 0.0 || widget_h <= 0.0 || image_w <= 0.0 || image_h <= 0.0 {
+        return None;
     }
+    Some(ViewTransform::contain(widget_w, widget_h, image_w, image_h).contain_scale)
 }
 
-fn compute_contain_transform(
+fn effective_scale_for_dims(
     widget_w: f64,
     widget_h: f64,
     image_w: f64,
     image_h: f64,
-) -> (f64, f64, f64) {
-    // contain
-    let scale = (widget_w / image_w).min(widget_h / image_h).max(0.0001);
-    let draw_w = image_w * scale;
-    let draw_h = image_h * scale;
-    let ox = (widget_w - draw_w) / 2.0;
-    let oy = (widget_h - draw_h) / 2.0;
-    (scale, ox, oy)
+    zoom_factor: f64,
+) -> Option<f64> {
+    contain_scale_for_dims(widget_w, widget_h, image_w, image_h).map(|s| s * zoom_factor)
+}
+
+fn max_zoom_factor_for_dims(widget_w: f64, widget_h: f64, image_w: f64, image_h: f64) -> f64 {
+    let contain_scale =
+        contain_scale_for_dims(widget_w, widget_h, image_w, image_h).unwrap_or(MIN_ZOOM_FACTOR);
+    let max_zoom = BASE_MAX_ZOOM_FACTOR.max(1.0 / contain_scale);
+    debug_assert!(contain_scale * max_zoom >= 1.0 - 1e-12);
+    max_zoom
 }
 
-fn widget_rect_to_image_rect(sel: Rect, scale: f64, ox: f64, oy: f64) -> Rect {
-    Rect {
-        x: (sel.x - ox) / scale,
-        y: (sel.y - oy) / scale,
-        w: sel.w / scale,
-        h: sel.h / scale,
+fn can_pan_at_view(
+    widget_w: f64,
+    widget_h: f64,
+    image_w: f64,
+    image_h: f64,
+    zoom_factor: f64,
+) -> bool {
+    let Some(scale) = effective_scale_for_dims(widget_w, widget_h, image_w, image_h, zoom_factor)
+    else {
+        return false;
+    };
+    image_w * scale > widget_w + PAN_DIM_EPS || image_h * scale > widget_h + PAN_DIM_EPS
+}
+
+fn recenter_for_anchor(
+    widget_center: Point,
+    scale: f64,
+    anchor_widget: Point,
+    anchor_img: Point,
+) -> Point {
+    Point {
+        x: anchor_img.x - (anchor_widget.x - widget_center.x) / scale,
+        y: anchor_img.y - (anchor_widget.y - widget_center.y) / scale,
+    }
+}
+
+fn scaling_filter_for_scale(scale: f64) -> gtk::gsk::ScalingFilter {
+    let is_near_integer = scale > 1.0 && (scale - scale.round()).abs() <= INTEGER_SCALE_EPS;
+    if is_near_integer {
+        gtk::gsk::ScalingFilter::Nearest
+    } else {
+        gtk::gsk::ScalingFilter::Trilinear
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::{max_zoom_factor_for_dims, natural_size_for_measure, size_to_i32};
+    use gtk4::Orientation;
+    use quickview_core::geometry::ViewTransform;
+
+    #[test]
+    fn size_to_i32_rounds_and_clamps() {
+        assert_eq!(size_to_i32(-100.0), 1);
+        assert_eq!(size_to_i32(0.0), 1);
+        assert_eq!(size_to_i32(0.49), 1);
+        assert_eq!(size_to_i32(0.50), 1);
+        assert_eq!(size_to_i32(1.4), 1);
+        assert_eq!(size_to_i32(1.5), 2);
+        assert_eq!(size_to_i32((i32::MAX as f64) + 12345.0), i32::MAX);
+    }
+
+    #[test]
+    fn measure_preserves_aspect_ratio_when_constrained() {
+        // 2:1 image
+        let image_w = 400.0;
+        let image_h = 200.0;
+
+        let h = natural_size_for_measure(Orientation::Horizontal, 100, image_w, image_h);
+        assert_eq!(h, 200);
+
+        let v = natural_size_for_measure(Orientation::Vertical, 100, image_w, image_h);
+        assert_eq!(v, 50);
+    }
+
+    #[test]
+    fn measure_uses_image_dimensions_when_unconstrained() {
+        let image_w = 123.2;
+        let image_h = 456.6;
+
+        let h = natural_size_for_measure(Orientation::Horizontal, -1, image_w, image_h);
+        assert_eq!(h, 123);
+
+        let v = natural_size_for_measure(Orientation::Vertical, -1, image_w, image_h);
+        assert_eq!(v, 457);
+    }
+
+    #[test]
+    fn dynamic_max_zoom_allows_absolute_scale_one_for_tiny_contain() {
+        let widget_w = 320.0;
+        let widget_h = 240.0;
+        let image_w = 12000.0;
+        let image_h = 8000.0;
+
+        let contain_scale =
+            ViewTransform::contain(widget_w, widget_h, image_w, image_h).contain_scale;
+        let max_zoom = max_zoom_factor_for_dims(widget_w, widget_h, image_w, image_h);
+        let max_absolute_scale = contain_scale * max_zoom;
+
+        assert!(max_zoom > 20.0);
+        assert!(max_absolute_scale >= 1.0);
     }
 }
diff --git a/crates/quickview-ui/src/windows/full_viewer.rs b/crates/quickview-ui/src/windows/full_viewer.rs
index d2790dc..7e8fece 100644
--- a/crates/quickview-ui/src/windows/full_viewer.rs
+++ b/crates/quickview-ui/src/windows/full_viewer.rs
@@ -27,6 +27,7 @@ pub fn present(app: &adw::Application, opts: &LaunchOptions) {
     // Key handling: arrows navigate, Ctrl+C copies.
     {
         let viewer = viewer.clone();
+        let overlay = viewer.overlay();
         let window_clone = window.clone();
         let controller = gtk::EventControllerKey::new();
         controller.connect_key_pressed(move |_, key, _, state| {
@@ -38,6 +39,22 @@ pub fn present(app: &adw::Application, opts: &LaunchOptions) {
                 return glib::Propagation::Stop;
             }
 
+            if key == gtk::gdk::Key::plus
+                || key == gtk::gdk::Key::equal
+                || key == gtk::gdk::Key::KP_Add
+            {
+                overlay.zoom_in();
+                return glib::Propagation::Stop;
+            }
+            if key == gtk::gdk::Key::minus || key == gtk::gdk::Key::KP_Subtract {
+                overlay.zoom_out();
+                return glib::Propagation::Stop;
+            }
+            if key == gtk::gdk::Key::_0 || key == gtk::gdk::Key::Home {
+                overlay.reset_view();
+                return glib::Propagation::Stop;
+            }
+
             if key == gtk::gdk::Key::Left {
                 viewer.prev_image();
                 return glib::Propagation::Stop;
diff --git a/crates/quickview-ui/src/windows/quick_preview.rs b/crates/quickview-ui/src/windows/quick_preview.rs
index 77e63b7..31723ba 100644
--- a/crates/quickview-ui/src/windows/quick_preview.rs
+++ b/crates/quickview-ui/src/windows/quick_preview.rs
@@ -34,6 +34,7 @@ pub fn present(app: &adw::Application, opts: &LaunchOptions) {
     // Key handling: Esc/Space closes, Ctrl+C copies.
     {
         let viewer = viewer.clone();
+        let overlay = viewer.overlay();
         let window_clone = window.clone();
         let controller = gtk::EventControllerKey::new();
         controller.connect_key_pressed(move |_, key, _, state| {
@@ -44,6 +45,22 @@ pub fn present(app: &adw::Application, opts: &LaunchOptions) {
                 return glib::Propagation::Stop;
             }
 
+            if key == gtk::gdk::Key::plus
+                || key == gtk::gdk::Key::equal
+                || key == gtk::gdk::Key::KP_Add
+            {
+                overlay.zoom_in();
+                return glib::Propagation::Stop;
+            }
+            if key == gtk::gdk::Key::minus || key == gtk::gdk::Key::KP_Subtract {
+                overlay.zoom_out();
+                return glib::Propagation::Stop;
+            }
+            if key == gtk::gdk::Key::_0 || key == gtk::gdk::Key::Home {
+                overlay.reset_view();
+                return glib::Propagation::Stop;
+            }
+
             if key == gtk::gdk::Key::Escape || key == gtk::gdk::Key::space {
                 window_clone.close();
                 return glib::Propagation::Stop;
diff --git a/crates/quickview-ui/src/windows/shared.rs b/crates/quickview-ui/src/windows/shared.rs
index 502b275..ca014e9 100644
--- a/crates/quickview-ui/src/windows/shared.rs
+++ b/crates/quickview-ui/src/windows/shared.rs
@@ -52,7 +52,6 @@ impl ViewerController {
         self.overlay.widget()
     }
 
-    #[allow(dead_code)]
     pub fn overlay(&self) -> ImageOverlayWidget {
         self.overlay.clone()
     }
diff --git a/diagrams/architecture.mmd b/diagrams/architecture.mmd
index 56a99ea..b348c05 100644
--- a/diagrams/architecture.mmd
+++ b/diagrams/architecture.mmd
@@ -1,12 +1,11 @@
-```mermaid
 flowchart LR
   subgraph UI[GTK4 / libadwaita UI Process]
-    A[App entry\nCLI + .desktop] --> B{Mode?}
-    B -->|--quick-preview| Q[Quick Preview Window\n(borderless overlay)]
+    A[App entry<br/>CLI + .desktop] --> B{Mode?}
+    B -->|--quick-preview| Q[Quick Preview Window<br/>borderless overlay]
     B -->|default| F[Full Viewer Window]
-    Q --> R[Renderer\n(texture + transforms)]
+    Q --> R[Renderer<br/>texture + transforms]
     F --> R
-    R --> O[OCR Overlay Layer\n(hit-testing + selection)]
+    R --> O[OCR Overlay Layer<br/>hit-testing + selection]
   end
 
   subgraph IMG[Image Pipeline]
@@ -15,17 +14,17 @@ flowchart LR
   end
 
   subgraph OCR[OCR Pipeline]
-    T1[Prepare bitmap\n(optional preprocess)] --> T2[OCR engine]
-    T2 --> T3[Layout output\n(TSV/HOCR)]
-    T3 --> T4[Parsed boxes\nwords/lines + confidence]
+    T1[Prepare bitmap<br/>optional preprocess] --> T2[OCR engine]
+    T2 --> T3[Layout output<br/>TSV/HOCR]
+    T3 --> T4[Parsed boxes<br/>words/lines + confidence]
   end
 
   I1 --> I2 --> I3 --> R
   I3 --> T1 --> T2 --> T3 --> T4 --> O
 
   subgraph Cache[Cache]
-    C1[(In-memory cache)]:::cache
-    C2[(Optional persistent cache)]:::cache
+    C1[In-memory cache]:::cache
+    C2[Optional persistent cache]:::cache
   end
 
   T4 --> C1
@@ -33,4 +32,3 @@ flowchart LR
   T4 --> C2
 
   classDef cache fill:#f2f2f2,stroke:#bbb,color:#111;
-```
diff --git a/diagrams/state_machine.mmd b/diagrams/state_machine.mmd
index 4b0d14c..1c50a62 100644
--- a/diagrams/state_machine.mmd
+++ b/diagrams/state_machine.mmd
@@ -1,4 +1,3 @@
-```mermaid
 stateDiagram-v2
   [*] --> Idle
   Idle --> LoadingImage: Open(path/stdin)
@@ -11,4 +10,3 @@ stateDiagram-v2
   OcrRunning --> Closed: Space/Esc (cancel/ignore)
   OcrReady --> Closed: Space/Esc
   Closed --> [*]
-```
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 770c051..d419b80 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -15,12 +15,12 @@ This architecture aims to satisfy the spec goals:
 ```mermaid
 flowchart LR
   subgraph UI[GTK4 / libadwaita UI Process]
-    A[App entry\nCLI + .desktop] --> B{Mode?}
-    B -->|--quick-preview| Q[Quick Preview Window\n(borderless overlay)]
+    A[App entry<br/>CLI + .desktop] --> B{Mode?}
+    B -->|--quick-preview| Q[Quick Preview Window<br/>borderless overlay]
     B -->|default| F[Full Viewer Window]
-    Q --> R[Renderer\n(texture + transforms)]
+    Q --> R[Renderer<br/>texture + transforms]
     F --> R
-    R --> O[OCR Overlay Layer\n(hit-testing + selection)]
+    R --> O[OCR Overlay Layer<br/>hit-testing + selection]
   end
 
   subgraph IMG[Image Pipeline]
@@ -29,17 +29,17 @@ flowchart LR
   end
 
   subgraph OCR[OCR Pipeline]
-    T1[Prepare bitmap\n(optional preprocess)] --> T2[OCR engine]
-    T2 --> T3[Layout output\n(TSV/HOCR)]
-    T3 --> T4[Parsed boxes\nwords/lines + confidence]
+    T1[Prepare bitmap<br/>optional preprocess] --> T2[OCR engine]
+    T2 --> T3[Layout output<br/>TSV/HOCR]
+    T3 --> T4[Parsed boxes<br/>words/lines + confidence]
   end
 
   I1 --> I2 --> I3 --> R
   I3 --> T1 --> T2 --> T3 --> T4 --> O
 
   subgraph Cache[Cache]
-    C1[(In-memory cache)]:::cache
-    C2[(Optional persistent cache)]:::cache
+    C1[In-memory cache]:::cache
+    C2[Optional persistent cache]:::cache
   end
 
   T4 --> C1
@@ -163,21 +163,36 @@ Store OCR results as:
 - optional: paragraph/block grouping for better selection behavior (future)
 
 ### 7.2 Hit testing
-Selection requires mapping pointer coordinates → OCR boxes. Best practice:
-- build a spatial index (e.g., grid index or R-tree) over word bounding boxes in image coordinates
-- at drag-select, query overlapping boxes, then order them by reading order (line then x)
+Selection requires mapping pointer coordinates → OCR boxes.
+
+Implemented in `crates/quickview-core/src/ocr/index.rs` as `OcrWordIndex` — a uniform-grid spatial index (256px cells) over word bounding boxes in image coordinates. Built once when OCR results arrive; queried on every drag-select update via `query_intersecting()`. Falls back to linear scan if no index is available.
 
 ### 7.3 Transform math
-Maintain a view transform `T`:
-- scale (zoom)
-- translation (pan)
-- fit-to-window baseline transform
+Implemented in `crates/quickview-core/src/geometry.rs` as `ViewTransform`.
+
+**Canonical state** (stored per-widget, resize-stable):
+- `zoom_factor: f64` — 1.0 = contain-fit
+- `center_img: Point` — image-space point at widget center
+
+**Deriving the transform each frame** (`ViewTransform::from_center`):
+- `contain()` returns a `ContainResult { contain_scale, widget_center }`
+- `scale = contain_scale * zoom_factor`
+- `offset = widget_center - center_img * scale`
+- Constructor validates non-finite and non-positive scale values (`ViewTransformError`)
+- Fields are private; accessed via `.scale()`, `.offset_x()`, `.offset_y()` getters
 
 Convert bounding boxes for render:
-- `bbox_widget = T(bbox_image)`
+- `bbox_widget = T(bbox_image)` via `image_rect_to_widget()`
+
+Hit-testing and selection do the inverse:
+- `p_image = T⁻¹(p_widget)` via `widget_to_image()`
+- `sel_image = T⁻¹(sel_widget)` via `widget_rect_to_image()` — converts a drag-selection rectangle to image coordinates for OCR word intersection testing
+
+**Clamping**: `clamp_center()` keeps the image covering the viewport when zoomed in, or forces `center_img` to image center when the scaled image fits within the widget. Clamped values are written back to state to keep it canonical.
+
+**Zoom anchoring**: anchor-preserving math ensures the image point under the cursor (or pinch center) stays fixed after zoom. See `recenter_for_anchor()` in `image_overlay.rs`.
 
-Hit-testing does the inverse:
-- `p_image = T^-1(p_widget)`
+**Rendering**: `ZoomableCanvas` (custom `gtk::Widget` subclass in `image_overlay.rs`) uses the GSK/Snapshot pipeline — `snapshot.append_scaled_texture()` for GPU-accelerated image rendering, `snapshot.append_cairo()` only for lightweight overlay primitives (selection rect, OCR highlights).
 
 ### 7.4 Copy semantics
 When copying selection:
diff --git a/docs/DECISIONS.md b/docs/DECISIONS.md
index 4068951..70ad51f 100644
--- a/docs/DECISIONS.md
+++ b/docs/DECISIONS.md
@@ -135,7 +135,24 @@ Recommendation: implement TSV first; add hOCR later as debug/export.
 
 ---
 
-### 8) Threading + cancellation
+### 8) Image rendering: custom Widget subclass + GSK/Snapshot pipeline
+
+**Why**
+- `gtk::Picture` does its own internal contain-fit with no way to inject a zoom/pan transform.
+- The GSK/Snapshot pipeline (`append_scaled_texture`) keeps the `GdkTexture` on the GPU — no `Texture::download()` to a Cairo surface, no CPU-bound rendering on the main thread.
+- Cairo is only used for lightweight overlay primitives (selection rect, OCR highlights) via `snapshot.append_cairo()`.
+
+**Tradeoffs**
+- Requires a custom `gtk::Widget` subclass (`ZoomableCanvas`) with `glib::subclass` boilerplate.
+- Requires GTK >= 4.10 for `append_scaled_texture` (the `v4_10` feature gate).
+
+**Alternatives considered**
+- **`gtk::Picture` + Cairo overlay**: simpler but `Texture::download()` blocks the main thread and wastes memory holding both GPU and CPU copies.
+- **`snapshot.save()/translate()/scale()/append_texture()`**: works without `v4_10` but no scaling filter control.
+
+---
+
+### 9) Threading + cancellation
 
 **Recommended**
 - OCR runs in a worker thread (or separate process) and returns results over a channel.
diff --git a/docs/DEPENDENCIES.md b/docs/DEPENDENCIES.md
index 509c608..1600f31 100644
--- a/docs/DEPENDENCIES.md
+++ b/docs/DEPENDENCIES.md
@@ -5,7 +5,7 @@ This file is a quick reference for system and Rust dependencies.
 ## System dependencies (Arch)
 
 Required:
-- gtk4
+- gtk4 (>= 4.10 — required for `append_scaled_texture` used by the zoom/pan renderer)
 - libadwaita
 - tesseract
 - tesseract language pack(s) (at least English: `tesseract-data-eng`)
@@ -17,7 +17,7 @@ Optional:
 
 ## Rust crates (workspace)
 
-- `gtk4` (GTK4 bindings)
+- `gtk4` (GTK4 bindings, `v4_10` feature enabled)
 - `libadwaita` (Adwaita widgets)
 - `gtk4-layer-shell` (Layer Shell integration)
 - `clap` (CLI)
diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md
index fe2c3ed..e227770 100644
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -38,7 +38,8 @@ sudo pacman -S --needed tesseract tesseract-data-eng
 - Keep all OCR and I/O off the GTK main thread.
 - Use `async-channel` + `glib::MainContext::spawn_local()` to send results back to the UI.
 - Prefer small widgets with clear responsibilities:
-  - `ImageOverlayWidget` draws the image and highlights
+  - `ImageOverlayWidget` wraps the overlay + spinner; delegates to `ZoomableCanvas`
+  - `ZoomableCanvas` (custom `gtk::Widget` subclass) handles image rendering via GSK/Snapshot, zoom/pan state, selection gestures, and OCR highlight overlay
   - `ViewerController` manages OCR dispatch and yields `OcrResult`
 
 ## Useful tasks
diff --git a/docs/PHASED_PLAN.md b/docs/PHASED_PLAN.md
index 286e5f6..e8f2d7a 100644
--- a/docs/PHASED_PLAN.md
+++ b/docs/PHASED_PLAN.md
@@ -8,7 +8,7 @@ If priorities change, you can reshuffle phases, but try to keep the “render fi
 
 ---
 
-## Phase 0 — Repo + tooling foundation
+## Phase 0 — Repo + tooling foundation ✅
 
 **Deliverables**
 - Repository structure (`crates/`, `docs/`, `adrs/`)
@@ -17,105 +17,105 @@ If priorities change, you can reshuffle phases, but try to keep the “render fi
 - Packaging skeletons (Arch PKGBUILD stub + Flatpak manifest stub)
 
 **Definition of done**
-- `build` succeeds on Arch in a clean environment
-- `run` launches an empty window without warnings
-- `docs/` renders in your preferred markdown viewer
+- `build` succeeds on Arch in a clean environment ✅
+- `run` launches an empty window without warnings ✅
+- `docs/` renders in your preferred markdown viewer ✅
 
 ---
 
-## Phase 1 — Full Viewer: open + display image
+## Phase 1 — Full Viewer: open + display image ✅
 
 **Core tasks**
 - Implement CLI parsing:
-  - `quickview <path>`
-  - `quickview --help`
+  - `quickview <path>` ✅
+  - `quickview --help` ✅
 - Load and display image:
-  - decode to a texture
-  - show in a viewer widget
-- Fit-to-window baseline
+  - decode to a texture ✅
+  - show in a viewer widget ✅
+- Fit-to-window baseline ✅
 - Keyboard shortcuts:
-  - `Esc` closes window
-  - `+/-` zoom (or Ctrl+scroll)
-- Basic UI shell with libadwaita (headerbar, etc.)
+  - `Esc` closes window ✅
+  - `+/-` zoom (or Ctrl+scroll) ✅
+- Basic UI shell with libadwaita (headerbar, etc.) ✅
 
 **Definition of done**
-- Opening a PNG/JPG renders correctly
-- No UI freezes during decode (decode is async or sufficiently fast)
-- *(Zoom and pan deferred — see Phase 5 or later)*
+- Opening a PNG/JPG renders correctly ✅
+- No UI freezes during decode (decode is async or sufficiently fast) ✅
+- Zoom and pan: Ctrl+scroll, pinch-to-zoom, +/- keys, middle-drag pan ✅
 
 ---
 
-## Phase 2 — Directory navigation + info panel
+## Phase 2 — Directory navigation + info panel (partially done)
 
 **Core tasks**
-- Identify “image set” as all supported images in the same directory
-- Maintain a sorted list and current index
+- Identify “image set” as all supported images in the same directory ✅
+- Maintain a sorted list and current index ✅
 - Add navigation:
-  - Left/Right arrows to prev/next
+  - Left/Right arrows to prev/next ✅
 - Add info panel:
   - filename
   - dimensions
   - file size
 
 **Definition of done**
-- Prev/next navigation is correct and stable
+- Prev/next navigation is correct and stable ✅
 - Info updates immediately when switching images
 
 ---
 
-## Phase 3 — Quick Preview mode (borderless overlay)
+## Phase 3 — Quick Preview mode (borderless overlay) ✅
 
 **Core tasks**
 - Add a `--quick-preview` mode:
-  - borderless
-  - centered
-  - dismiss on Space/Esc
-- Implement “always-on-top” behavior
+  - borderless ✅
+  - centered ✅
+  - dismiss on Space/Esc ✅
+- Implement “always-on-top” behavior ✅
 - If available, integrate Layer Shell (wlroots-friendly overlay):
-  - runtime detect if Layer Shell is supported
-  - use overlay layer with appropriate keyboard focus policy
+  - runtime detect if Layer Shell is supported ✅
+  - use overlay layer with appropriate keyboard focus policy ✅
 
 **Definition of done**
-- `quickview --quick-preview <image>` shows a borderless preview and closes instantly on Space/Esc
-- Works on at least one wlroots compositor
+- `quickview --quick-preview <image>` shows a borderless preview and closes instantly on Space/Esc ✅
+- Works on at least one wlroots compositor ✅
 
 ---
 
-## Phase 4 — OCR pipeline integration (async)
+## Phase 4 — OCR pipeline integration (async) ✅
 
 **Core tasks**
-- Add OCR backend abstraction (interface/trait)
+- Add OCR backend abstraction (interface/trait) ✅
 - Implement default Tesseract backend:
-  - run OCR asynchronously
-  - produce word-level boxes + text
-- Add a non-blocking “OCR in progress” indicator
-- Ensure cancellation / ignoring late results when user navigates away
+  - run OCR asynchronously ✅
+  - produce word-level boxes + text ✅
+- Add a non-blocking “OCR in progress” indicator ✅
+- Ensure cancellation / ignoring late results when user navigates away ✅
 
 **Definition of done**
-- OCR starts after image display
-- OCR completion adds internal OCR result state (even before selection UI exists)
-- App stays responsive during OCR
+- OCR starts after image display ✅
+- OCR completion adds internal OCR result state (even before selection UI exists) ✅
+- App stays responsive during OCR ✅
 
 ---
 
-## Phase 5 — OCR overlay + text selection UX
+## Phase 5 — OCR overlay + text selection UX (partially done)
 
 **Core tasks**
-- Render OCR overlay (invisible by default or lightly highlighted on hover)
+- Render OCR overlay (invisible by default or lightly highlighted on hover) ✅
 - Implement drag-selection:
-  - compute selection rectangle in image coordinates
-  - highlight matched words
+  - compute selection rectangle in image coordinates ✅
+  - highlight matched words ✅
 - Implement copy:
-  - Ctrl+C copies selected text
+  - Ctrl+C copies selected text ✅
   - context menu action “Copy”
 
 **Definition of done**
-- User can reliably select and copy text from an image
-- Selection stays aligned under zoom/pan
+- User can reliably select and copy text from an image ✅
+- Selection stays aligned under zoom/pan ✅
 
 ---
 
-## Phase 6 — Integration polish
+## Phase 6 — Integration polish (not started)
 
 **Core tasks**
 - `.desktop` integration:
@@ -130,7 +130,7 @@ If priorities change, you can reshuffle phases, but try to keep the “render fi
 
 ---
 
-## Phase 7 — Hardening + performance
+## Phase 7 — Hardening + performance (not started)
 
 **Core tasks**
 - Add cache (in-memory first)
@@ -148,7 +148,7 @@ If priorities change, you can reshuffle phases, but try to keep the “render fi
 
 ---
 
-## Phase 8 — Nice-to-haves / future roadmap
+## Phase 8 — Nice-to-haves / future roadmap (not started)
 
 - Persistent OCR cache (SQLite)
 - Better layout/reading-order reconstruction
diff --git a/packaging/arch/PKGBUILD b/packaging/arch/PKGBUILD
index c7ce8a1..70b38ff 100644
--- a/packaging/arch/PKGBUILD
+++ b/packaging/arch/PKGBUILD
@@ -9,7 +9,7 @@ url="https://github.com/Green2Grey2/QuickView"
 license=("MIT")
 
 depends=(
-  "gtk4"
+  "gtk4>=4.10"
   "libadwaita"
   "tesseract"
   "gtk4-layer-shell"
diff --git a/scripts/bootstrap_arch.sh b/scripts/bootstrap_arch.sh
index 13092f7..018d3bb 100755
--- a/scripts/bootstrap_arch.sh
+++ b/scripts/bootstrap_arch.sh
@@ -7,6 +7,12 @@ sudo pacman -S --needed \
   tesseract tesseract-data-eng \
   gtk4-layer-shell
 
+# Enforce minimum GTK version required by the UI code (Snapshot/GSK APIs).
+if ! pkg-config --atleast-version=4.10 gtk4; then
+  echo "Error: GTK4 >= 4.10 is required. Found: $(pkg-config --modversion gtk4)" >&2
+  exit 1
+fi
+
 # Optional:
 # sudo pacman -S --needed wl-clipboard
 # sudo pacman -S --needed glycin glycin-gtk4
diff --git a/templates/PKGBUILD.stub b/templates/PKGBUILD.stub
index b8b191b..2a03ebb 100644
--- a/templates/PKGBUILD.stub
+++ b/templates/PKGBUILD.stub
@@ -7,7 +7,7 @@ pkgdesc="Wayland image viewer with OCR text selection"
 arch=("x86_64")
 url="https://github.com/Green2Grey2/QuickView"
 license=("MIT")
-depends=("gtk4" "libadwaita" "tesseract")
+depends=("gtk4>=4.10" "libadwaita" "tesseract")
 # Optional recommended deps:
 # depends+=("gtk4-layer-shell")
 # depends+=("glycin")  # if packaged; otherwise vendor/build