diff --git a/blog/2026-03-11-DocWire-up-to-date-2026.md b/blog/2026-03-11-DocWire-up-to-date-2026.md new file mode 100644 index 0000000..14d7493 --- /dev/null +++ b/blog/2026-03-11-DocWire-up-to-date-2026.md @@ -0,0 +1,109 @@ +--- +title: 'DocWire: A 2025 Retrospective' +authors: krzysztof +tags: [news, update] +--- + +For a while, this blog has been quiet — but DocWire hasn’t. + +Behind the scenes, 2025 became a defining year for the project. What started as a document-processing toolkit gradually transformed into something far more ambitious: a modular, message-driven framework capable of powering modern data pipelines and AI-driven workflows. + +This post is a short retrospective of that transformation — and a look at the engineering decisions that made it possible. + + + + +## A New Core: Message-Driven Architecture + +The most fundamental shift happened deep inside the SDK. + +Earlier versions relied on tightly coupled data representations, which made extending the system increasingly difficult. Adding new data types often required changes across multiple components. + +In 2025, DocWire introduced a message-driven core — a decoupled model where processing stages communicate through generic messages flowing across pipelines. + +This shift unlocked several key benefits: +- clearer data flow across processing stages +- independent, composable components +- easier extensibility without breaking existing pipelines +- the ability to expose pipelines as standalone microservices + +Rather than a collection of utilities, DocWire started to behave like a coherent processing platform. + + + +## AI Integration: Local and Cloud by Design + +The new architecture enabled deeper integration with language models and semantic processing. + +DocWire pipelines can now leverage both: +- local models for offline and privacy-sensitive scenarios +- cloud-based models for advanced conversational or generative workflows + +This hybrid approach gives developers flexibility to choose the right execution context for each task — from fully offline semantic search to cloud-augmented analysis pipelines. + +Even lightweight local models can provide multilingual embeddings and semantic understanding directly within applications, without network dependencies. + + + +## High-Fidelity Document Understanding + +Alongside architectural changes, the core parsing capabilities evolved significantly. + +Throughout 2025, DocWire introduced: +- a redesigned HTML parsing pipeline +- a new PDF engine with improved stability +- image extraction from complex document formats +- position-aware layout reconstruction + +This progression culminated in what we describe as high-fidelity document understanding: the ability to capture not only textual content but also the structural and spatial context of documents. + +Such layout awareness is essential for accurate indexing, semantic retrieval, and downstream AI processing. + + + +## Developer Experience and Production Readiness + +A major focus of the year was making DocWire not only powerful but also pleasant to build with. + +Key improvements included: +- a zero-overhead diagnostic system for development-time insights +- structured JSON logging for easier observability +- richer error reporting with contextual metadata +- safer assertions and defensive programming utilities +- modernization of the build system and CI workflows + +These changes may be less visible than new features, but they are critical for building reliable production systems. + + + +## Watch: 2025 Engineering Recap + +To illustrate this transformation visually, we prepared a short overview of the architectural and functional changes introduced throughout the year. + +[![Video title](https://img.youtube.com/vi/vBgrIh04R-I/hqdefault.jpg)](https://youtu.be/vBgrIh04R-I) + + + +## What This Means + +Looking back, 2025 was less about incremental improvement and more about redefinition. + +DocWire is no longer just a library integrated into applications. The foundations now exist for treating it as an application backbone — a processing layer capable of running independently as part of larger data infrastructures. + +This evolution opens the door to new usage patterns: +- pipeline-based document ingestion +- service-oriented deployment models +- local AI processing workflows +- backend systems centered around structured data extraction + +In short, the toolkit matured into a framework. + + + +## What Comes Next + +This retrospective sets the stage for what follows. + +In the next post, we’ll introduce a concept that helps describe this evolution at a higher level: DocWire as the invisible engine behind secure data pipelines. + +For now, if you’ve been following the project — or discovering it for the first time — 2025 represents a turning point worth exploring. diff --git a/blog/authors.yml b/blog/authors.yml index 9aad571..de3dfad 100644 --- a/blog/authors.yml +++ b/blog/authors.yml @@ -2,3 +2,8 @@ ferid: name: Ferid Obeidat title: Author image_url: /img/Ferid-headshot.png +krzysztof: + name: Krzysztof Nowicki + title: Author + image_url: /img/krzysztof.png + diff --git a/static/img/krzysztof.png b/static/img/krzysztof.png new file mode 100644 index 0000000..1ddf984 Binary files /dev/null and b/static/img/krzysztof.png differ