<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Krzysztof Nowicki</title>
    <description>The latest articles on Forem by Krzysztof Nowicki (@novitzmann).</description>
    <link>https://forem.com/novitzmann</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1540598%2Fdaaa546a-f7c8-47f8-ab73-7836fc2bdcfa.png</url>
      <title>Forem: Krzysztof Nowicki</title>
      <link>https://forem.com/novitzmann</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/novitzmann"/>
    <language>en</language>
    <item>
      <title>We just hit 100 GitHub stars ⭐ — and we’d love to meet some of you</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Tue, 17 Mar 2026 18:12:33 +0000</pubDate>
      <link>https://forem.com/novitzmann/we-just-hit-100-github-stars-and-wed-love-to-meet-some-of-you-1j7o</link>
      <guid>https://forem.com/novitzmann/we-just-hit-100-github-stars-and-wed-love-to-meet-some-of-you-1j7o</guid>
      <description>&lt;p&gt;We’ve just crossed 100 stars on GitHub for DocWire.&lt;/p&gt;

&lt;p&gt;Not a huge number — but for a niche, modern C++ data processing SDK, it actually means a lot to us. &lt;/p&gt;

&lt;p&gt;DocWire is built for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extracting data from complex file formats&lt;/li&gt;
&lt;li&gt;building offline / secure data pipelines&lt;/li&gt;
&lt;li&gt;integrating with AI / LLM workflows (RAG, preprocessing, etc.)&lt;/li&gt;
&lt;li&gt;all in modern C++20&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re not a “viral” project.. (YET!)&lt;br&gt;
Most of our users are solving very specific, often enterprise-level problems.&lt;/p&gt;

&lt;p&gt;That’s why every single star matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  A small request
&lt;/h3&gt;

&lt;p&gt;If you starred the repo and you're here on Dev.to — I’d genuinely love to connect.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are you working on?&lt;/li&gt;
&lt;li&gt;What made you star the project?&lt;/li&gt;
&lt;li&gt;Are you dealing with document/data extraction problems?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even a short comment or message would be great.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s next
&lt;/h3&gt;

&lt;p&gt;We’re actively developing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new parsers (including EPUB)&lt;/li&gt;
&lt;li&gt;better structured data extraction&lt;/li&gt;
&lt;li&gt;easier "get up and go"&lt;/li&gt;
&lt;li&gt;tighter integration with AI pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks again for the support — and curious to meet the people behind those stars ⭐ (here is the proof ! &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt; )&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cpp</category>
      <category>opensource</category>
      <category>github</category>
    </item>
    <item>
      <title>First 2026 DocWire SDK Release: Modern C++20 XML Parsing, Safety Policies, and Parser Resilience</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Thu, 22 Jan 2026 14:07:29 +0000</pubDate>
      <link>https://forem.com/novitzmann/first-2026-docwire-sdk-release-modern-c20-xml-parsing-safety-policies-and-parser-resilience-120i</link>
      <guid>https://forem.com/novitzmann/first-2026-docwire-sdk-release-modern-c20-xml-parsing-safety-policies-and-parser-resilience-120i</guid>
      <description>&lt;p&gt;We’ve released a new version of DocWire SDK focused on &lt;strong&gt;core architecture, safety, and robustness&lt;/strong&gt;, rather than user-facing features.&lt;/p&gt;

&lt;p&gt;This update introduces a modern C++20 foundation for XML parsing, type-safe conversions, and configurable safety guarantees, alongside multiple improvements in parser resilience and memory management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Highlights of this release
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modern C++20 XML parsing API&lt;/strong&gt;&lt;br&gt;
A new forward-only, single-pass XML reader based on C++20 ranges and views replaces the legacy XmlStream implementation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Configurable safety policies&lt;/strong&gt;&lt;br&gt;
Developers can choose between strict checking (exceptions on violations) and relaxed, zero-overhead execution via checked, not_null, and enforce utilities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Type-safe data conversion framework&lt;/strong&gt;&lt;br&gt;
New &lt;code&gt;convert::try_to&lt;/code&gt; and &lt;code&gt;convert::to&lt;/code&gt; APIs replace ad-hoc string conversions and support custom formats (e.g. date parsing).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Standardized date/time handling&lt;/strong&gt;&lt;br&gt;
All date and time logic now uses &lt;code&gt;std::chrono::sys_seconds&lt;/code&gt; instead of &lt;code&gt;struct tm&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partial failure resilience&lt;/strong&gt;&lt;br&gt;
Parsers can continue processing even when some sub-items fail, while still detecting total failures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;New core utilities&lt;/strong&gt;&lt;br&gt;
Named parameters, non-null enforcement, ranged numeric types, and debug-only assertions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, the release includes parser-specific robustness fixes (HTML, PST, PDF), logging refinements, updated documentation, and expanded test coverage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GitHub &amp;amp; release notes: &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Documentation: &lt;a href="https://www.docwire.io" rel="noopener noreferrer"&gt;https://www.docwire.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sourceforge: &lt;a href="https://sourceforge.net/projects/docwire/" rel="noopener noreferrer"&gt;https://sourceforge.net/projects/docwire/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback from developers working with document processing or backend systems in modern C++ is welcome.&lt;/p&gt;

&lt;p&gt;--The DocWire Team--&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>softwareengineering</category>
      <category>backend</category>
      <category>systemsprogramming</category>
    </item>
    <item>
      <title>DocWire SDK in 2025 – Architecture, AI Pipelines, and Document Processing in Modern C++</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 07 Jan 2026 11:37:51 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-in-2025-architecture-ai-pipelines-and-document-processing-in-modern-c-gan</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-in-2025-architecture-ai-pipelines-and-document-processing-in-modern-c-gan</guid>
      <description>&lt;p&gt;In 2025, most of the work on DocWire SDK focused on &lt;strong&gt;architecture, correctness, and long-term maintainability&lt;/strong&gt;, rather than surface-level features.&lt;/p&gt;

&lt;p&gt;To document this work, we published a technical recap video summarizing the most important engineering changes introduced during the year.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DocWire SDK – 2025 Technical Summary&lt;/strong&gt;&lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/vBgrIh04R-I"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h3&gt;
  
  
  What changed in 2025
&lt;/h3&gt;

&lt;p&gt;The video covers, among others:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;migration from a &lt;code&gt;std::variant&lt;/code&gt;-based data model to a &lt;strong&gt;polymorphic, message-driven pipeline architecture&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;exposing DocWire pipelines as &lt;strong&gt;HTTP/HTTPS microservices&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;integration of &lt;strong&gt;local, offline AI embeddings&lt;/strong&gt; (multilingual E5 models)&lt;/li&gt;
&lt;li&gt;expanded &lt;strong&gt;OpenAI support&lt;/strong&gt; (GPT-4o, GPT-5, embeddings, transcription, TTS)&lt;/li&gt;
&lt;li&gt;replacement of PoDoFo with &lt;strong&gt;Google PDFium&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;high-precision &lt;strong&gt;OCR and PDF positional metadata&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;image-aware PDF processing and OCR&lt;/li&gt;
&lt;li&gt;modern HTML parsing and robust charset conversion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;zero-cost logging&lt;/strong&gt;, structured error diagnostics, and CI/CD modernization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The intent of this video is purely technical: architecture, APIs, performance, and engineering decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Website: &lt;a href="https://www.docwire.io" rel="noopener noreferrer"&gt;https://www.docwire.io&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're working on document processing, backend systems, or AI-assisted pipelines in modern C++, feedback and discussion are welcome.&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>softwareengineering</category>
      <category>backend</category>
      <category>ai</category>
    </item>
    <item>
      <title>Offline RAG in Modern C++: Secure Semantic Pipelines Without the Cloud</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 10 Dec 2025 20:50:16 +0000</pubDate>
      <link>https://forem.com/novitzmann/offline-rag-in-modern-c-secure-semantic-pipelines-without-the-cloud-4cn</link>
      <guid>https://forem.com/novitzmann/offline-rag-in-modern-c-secure-semantic-pipelines-without-the-cloud-4cn</guid>
      <description>&lt;p&gt;When you're dealing with confidential data — PII, medical records, trade secrets, or internal research — sending it to a third-party API for summarization or RAG preparation is a complete non-starter.&lt;/p&gt;

&lt;p&gt;But that doesn’t mean you have to give up LLM power. With modern C++, you can build a universal, format-agnostic, fully offline data pipeline in just a few lines.&lt;/p&gt;

&lt;p&gt;Below is how we (&lt;strong&gt;DocWire&lt;/strong&gt;) generate embeddings for a PDF and a Word document, compare them for semantic similarity, and keep all data strictly on your machine — no cloud, no external API calls, no vendor lock-in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Define a secure offline pipeline&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;auto pipeline = content_type::detector{}
              | office_formats_parser{}
              | local_ai::embed(local_ai::embed::e5_passage_prefix);

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single chain handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;format detection (PDF, DOCX, etc.)&lt;/li&gt;
&lt;li&gt;file parsing&lt;/li&gt;
&lt;li&gt;local embedding generation
All offline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;- Process confidential documents locally&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;auto report_vec =
    std::filesystem::path("secret_plans.pdf") | pipeline;

auto policy_vec =
    std::filesystem::path("compliance_rules.docx") | pipeline;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No cloud calls. No data ever leaves your system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Compare semantic similarity&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ensure(cosine_similarity(report_vec, policy_vec) &amp;gt; 0.85);

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have a local-only RAG building block:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embeddings&lt;/li&gt;
&lt;li&gt;comparisons&lt;/li&gt;
&lt;li&gt;chunking&lt;/li&gt;
&lt;li&gt;offline pipelines&lt;/li&gt;
&lt;li&gt;zero dependency on OpenAI / Google / AWS
Perfect for environments where data security is not optional.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your turn : How do you handle secure, local-only RAG?&lt;br&gt;
Different ecosystems approach this very differently. How would you design a cloud-free embedders + parser + similarity pipeline in: Python? Rust? Go? Java? C#? JavaScript?&lt;/p&gt;

&lt;p&gt;Drop your snippet or architectural idea below&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>security</category>
      <category>programming</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Modern C++ vs. The World: How Would Your Language Parse a Word Document?</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:07:27 +0000</pubDate>
      <link>https://forem.com/novitzmann/modern-c-vs-the-world-how-would-your-language-parse-a-word-document-h88</link>
      <guid>https://forem.com/novitzmann/modern-c-vs-the-world-how-would-your-language-parse-a-word-document-h88</guid>
      <description>&lt;p&gt;Modern C++ isn’t dying. It’s eating file formats for breakfast.&lt;br&gt;
Here’s what MS Word document parsing looks like today :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;std::filesystem::path("data_processing_definition.doc")
    | content_type::detector{}
    | office_formats_parser{}
    | PlainTextExporter()
    | out_stream;

ensure(out_stream.str()) ==
    "Data processing refers to the activities performed on raw data...";
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No COM, no Windows-only hacks, no XML archaeology - just a clean, composable pipeline in modern C++.&lt;br&gt;
So now I’m genuinely curious: If this is what parsing looks like in modern C++, what does it look like in your favorite language?&lt;br&gt;
Drop your snippet below.&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>programming</category>
      <category>showdev</category>
      <category>beginners</category>
    </item>
    <item>
      <title>DocWire SDK New Release: Turning Pipelines into Secure HTTP Services</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Thu, 02 Oct 2025 12:30:51 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-new-release-turning-pipelines-into-secure-http-services-4i5g</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-new-release-turning-pipelines-into-secure-http-services-4i5g</guid>
      <description>&lt;h1&gt;
  
  
  DocWire SDK New Release: Turning Pipelines into Secure HTTP Services
&lt;/h1&gt;

&lt;p&gt;We’re excited to announce a major new release of the &lt;strong&gt;DocWire SDK&lt;/strong&gt; — and this one opens a completely new way to use DocWire in your applications.&lt;/p&gt;

&lt;p&gt;At its core, DocWire has always been about high-performance &lt;strong&gt;document and data processing in C++&lt;/strong&gt;. In this release, we’ve gone a step further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We replaced the internal &lt;code&gt;std::variant&lt;/code&gt;-based Tag system with a new &lt;strong&gt;polymorphic &lt;code&gt;message_ptr&lt;/code&gt;&lt;/strong&gt; architecture, simplifying the data flow and making it easier to extend.&lt;/li&gt;
&lt;li&gt;And more importantly for many developers: we’ve introduced a &lt;strong&gt;built-in HTTP/HTTPS server&lt;/strong&gt; that lets you expose DocWire pipelines as secure microservices.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why the new HTTP server matters
&lt;/h2&gt;

&lt;p&gt;Traditionally, to use DocWire you had to embed it directly into your C++ application. That works great — but many modern systems are moving toward &lt;strong&gt;service-oriented and cloud-native architectures&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;With the new &lt;code&gt;http::server&lt;/code&gt; class, you can now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expose any pipeline as a REST-like API&lt;/strong&gt;: for example, mount a PDF-to-JSON parser at &lt;code&gt;/parse/pdf&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call DocWire from any language&lt;/strong&gt;: Python, Java, Go, or even curl — if it can talk HTTP, it can use DocWire.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale horizontally&lt;/strong&gt;: deploy pipelines behind a load balancer, containerize them with Docker/Kubernetes, and scale on demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure endpoints with TLS&lt;/strong&gt;: essential for finance, government, and healthcare workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns DocWire from a &lt;strong&gt;C++ library&lt;/strong&gt; into a &lt;strong&gt;full-fledged data processing service framework&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  A simple example
&lt;/h2&gt;

&lt;p&gt;Let’s say you want to parse incoming PDF documents and expose the output as JSON through an API. With the new server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;docwire&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/parse/pdf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;my_pdf_to_json_pipeline&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// true enables TLS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it — now any client can &lt;code&gt;POST&lt;/code&gt; a file to &lt;code&gt;/parse/pdf&lt;/code&gt; and get structured JSON back.&lt;/p&gt;




&lt;h2&gt;
  
  
  What else is new in this release
&lt;/h2&gt;

&lt;p&gt;Besides the HTTP server and the new polymorphic message system, we’ve included a number of improvements and fixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Post Content-Type&lt;/strong&gt;: automatically set based on the input MIME type, making API integrations easier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced vcpkg parsing&lt;/strong&gt;: now supports more complex platform-specific feature definitions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reorganized element headers&lt;/strong&gt;: document, mail, and AI element types are now separated for clarity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refactored XML options&lt;/strong&gt;: dedicated &lt;code&gt;XmlStream::no_blanks&lt;/code&gt; struct improves readability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified transformer API&lt;/strong&gt;: updated to the new message-based flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MSVC fixes&lt;/strong&gt;: resolved min macro conflicts, added &lt;code&gt;/bigobj&lt;/code&gt; for heavy templates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;macOS fixes&lt;/strong&gt;: addressed RTTI visibility issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependencies&lt;/strong&gt;: replaced &lt;code&gt;curlpp&lt;/code&gt; with &lt;code&gt;cpp-httplib&lt;/code&gt; and OpenSSL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI update&lt;/strong&gt;: removed deprecated macos-13 runner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs &amp;amp; tests&lt;/strong&gt;: full migration to &lt;code&gt;message_ptr&lt;/code&gt;, new end-to-end HTTP server integration tests.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this matters for developers
&lt;/h2&gt;

&lt;p&gt;This release is not just about features — it’s about flexibility.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you’re building &lt;strong&gt;monolithic C++ applications&lt;/strong&gt;, you get a cleaner, more extensible architecture.&lt;/li&gt;
&lt;li&gt;If you’re moving to &lt;strong&gt;microservices and distributed systems&lt;/strong&gt;, DocWire now fits directly into that model.&lt;/li&gt;
&lt;li&gt;If you need &lt;strong&gt;cross-language integration&lt;/strong&gt;, the HTTP server makes DocWire accessible anywhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We believe this is a big step toward making DocWire SDK not just a C++ library, but a &lt;strong&gt;platform for data processing at scale&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;The release is available now on GitHub: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.09.25" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.09.25&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As always, we welcome feedback, questions, and contributions from the community.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The DocWire Team -&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cpp</category>
      <category>opensource</category>
      <category>cloudnative</category>
      <category>http</category>
    </item>
    <item>
      <title>DocWire SDK 2025.08.xx Released – GPT-5 Now Fully Integrated, New Default Model</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 20 Aug 2025 14:10:46 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-202508xx-released-gpt-5-now-fully-integrated-new-default-model-3l3b</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-202508xx-released-gpt-5-now-fully-integrated-new-default-model-3l3b</guid>
      <description>&lt;p&gt;The 2025.08.13 release brings a major milestone to DocWire SDK: full support for &lt;strong&gt;OpenAI's newly released GPT-5 family of models&lt;/strong&gt;, including gpt_5, gpt_5_mini, and gpt_5_chat_latest.&lt;/p&gt;

&lt;p&gt;To reflect this evolution, &lt;code&gt;gpt_5&lt;/code&gt; is now the &lt;strong&gt;default model&lt;/strong&gt; for all OpenAI-related operations in DocWire — giving developers direct access to state-of-the-art AI with no additional configuration.&lt;/p&gt;

&lt;p&gt;This version also brings minor but important code quality improvements and updated documentation.&lt;/p&gt;

&lt;p&gt;Full release notes: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.08.13" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.08.13&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1 · GPT-5 Model Family Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Full integration of next-generation models:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gpt_5&lt;/code&gt;, &lt;code&gt;gpt_5_mini&lt;/code&gt;, &lt;code&gt;gpt_5_nano&lt;/code&gt;, &lt;code&gt;gpt_5_chat_latest&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Research-focused models: &lt;code&gt;o3_deep_research&lt;/code&gt;, &lt;code&gt;o4_mini_deep_research&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  2 · Default Model Upgraded
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;All OpenAI operations now default to &lt;code&gt;gpt_5&lt;/code&gt;, replacing previous versions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Modern libxml2 Compatibility&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deprecated &lt;code&gt;xmlGetGlobalState()&lt;/code&gt; removed from XML parser initialization.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Better Error Handling&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit exceptions now thrown for unknown or unsupported OpenAI models.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Code Quality Upgrades&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;explicit&lt;/code&gt; added to AI-related single-argument constructors in &lt;code&gt;AnalyzeData&lt;/code&gt;, &lt;code&gt;ExtractEntities&lt;/code&gt;, and &lt;code&gt;Summarize&lt;/code&gt; elements to avoid implicit conversions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Documentation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Updated README&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Reflects new default model and the complete list of available OpenAI models
&lt;/li&gt;
&lt;li&gt;CLI and code examples now demonstrate how to choose non-default models&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Release: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.08.13" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.08.13&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We welcome feedback, examples, and issues as always.&lt;/p&gt;

&lt;p&gt;— The DocWire Team&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>opensource</category>
      <category>gpt5</category>
      <category>ai</category>
    </item>
    <item>
      <title>DocWire SDK 2025.08.05 Released – Local AI Embeddings, SentencePiece, Cosine Similarity</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Thu, 07 Aug 2025 14:02:53 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-20250805-released-local-ai-embeddings-sentencepiece-cosine-similarity-5ed3</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-20250805-released-local-ai-embeddings-sentencepiece-cosine-similarity-5ed3</guid>
      <description>&lt;p&gt;The 2025.08.05 release brings a major milestone to DocWire SDK: fully local, offline AI-powered text embeddings. With the integration of the multilingual-e5-small model, DocWire now supports multilingual vectorization for advanced NLP tasks—completely offline.&lt;/p&gt;

&lt;p&gt;It also modernizes dependencies by switching from OpenNMT-Tokenizer to Google’s SentencePiece, and includes numerous build and CI improvements for better MSVC and Valgrind support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full release notes:&lt;/strong&gt; &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.08.05" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.08.05&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1 · Local AI Embeddings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Introduces &lt;code&gt;local_ai::embed&lt;/code&gt; for generating multilingual embeddings using &lt;code&gt;multilingual-e5-small&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Powers advanced use cases like semantic search, retrieval-augmented generation (RAG), and document clustering
&lt;/li&gt;
&lt;li&gt;CLI-ready via &lt;code&gt;--local-ai-embed&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2 · Cosine Similarity Utility
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Built-in cosine similarity function for comparing document/query vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3 · Tokenizer API (SentencePiece)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Public &lt;code&gt;local_ai::tokenizer&lt;/code&gt; based on Google’s SentencePiece
&lt;/li&gt;
&lt;li&gt;Supports encoding text into token IDs with &lt;code&gt;T5Tokenizer&lt;/code&gt; and &lt;code&gt;XLMRobertaTokenizer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Unified model runner (&lt;code&gt;local_ai::model_runner&lt;/code&gt;) now supports both encoder-only and sequence-to-sequence models&lt;/li&gt;
&lt;li&gt;Advanced pooling and L2 normalization for E5-compatible output&lt;/li&gt;
&lt;li&gt;New simplified constructor in &lt;code&gt;model_chain_element&lt;/code&gt; with default model&lt;/li&gt;
&lt;li&gt;CLI extended with support for embedding workflows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Refactors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Replaced OpenNMT-Tokenizer with a modern SentencePiece integration for improved maintainability and quality&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;MSVC: AddressSanitizer (ASan) issues resolved using specific macro definitions
&lt;/li&gt;
&lt;li&gt;CI:

&lt;ul&gt;
&lt;li&gt;Increased Valgrind timeouts
&lt;/li&gt;
&lt;li&gt;Skipped heavy tests under Callgrind
&lt;/li&gt;
&lt;li&gt;Abseil leak suppressions added for cleaner reports&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Documentation &amp;amp; Tests
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;New end-to-end embedding example (README): document + queries + cosine similarity
&lt;/li&gt;
&lt;li&gt;Unit tests for &lt;code&gt;local_ai::tokenizer&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Embedding example is compiled and tested in CI&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Release: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.08.05" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.08.05&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sourceforge: &lt;a href="https://sourceforge.net/projects/docwire/files/2025.08.05/" rel="noopener noreferrer"&gt;https://sourceforge.net/projects/docwire/files/2025.08.05/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This update cements DocWire as a serious offline-ready NLP SDK for C++ developers building hybrid pipelines.&lt;/p&gt;

&lt;p&gt;— The DocWire Team&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>ai</category>
      <category>nlp</category>
      <category>opensource</category>
    </item>
    <item>
      <title>DocWire SDK 2025.07.14 Released – OpenAI Embeddings, MIME Type Logic, HTTP Refactor</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 16 Jul 2025 10:42:26 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-20250714-released-openai-embeddings-mime-type-logic-http-refactor-405p</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-20250714-released-openai-embeddings-mime-type-logic-http-refactor-405p</guid>
      <description>&lt;p&gt;We’re back with a powerful new capability in DocWire SDK. Version &lt;strong&gt;2025.07.14&lt;/strong&gt; adds support for OpenAI embeddings, modularizes the HTTP client, and improves MIME-based content handling across AI components.&lt;/p&gt;

&lt;p&gt;Full release notes: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.07.14" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.07.14&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  New Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1 · OpenAI Embeddings Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;openai::embed&lt;/code&gt; chain element enables developers to generate semantic embeddings from text using OpenAI’s latest models:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;text-embedding-3-small&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;text-embedding-3-large&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;text-embedding-ada-002&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Includes CLI support (&lt;code&gt;--openai-embed&lt;/code&gt;, &lt;code&gt;--openai-embed-model&lt;/code&gt;) and full documentation with examples.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  2 · MIME to Extension Utility
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;content_type::by_file_extension::to_extension&lt;/code&gt; utility maps MIME types back to standard file extensions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More Robust AI Input Handling&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
AI components now detect input types using MIME (e.g., &lt;code&gt;text/plain&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;) instead of relying solely on file extensions. This improves accuracy in content classification and avoids misfires.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Smarter HTTP Uploads&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The &lt;code&gt;http::Post&lt;/code&gt; chain element can now infer file extensions from MIME types automatically, reducing friction in stream-based uploads.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Refactor
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Client Modularization&lt;/strong&gt;
&lt;code&gt;http::Post&lt;/code&gt; is now part of a dedicated &lt;code&gt;docwire_http&lt;/code&gt; library, improving modularity and simplifying future expansion.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Windows Build Fix&lt;/strong&gt;
Resolved vcpkg failure on Windows by explicitly linking &lt;code&gt;libxml2&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;README&lt;/strong&gt;
Fixed a formatting issue in the documentation.&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;A whisper of meaning, in numbers cast&lt;br&gt;&lt;br&gt;
New dimensions open, built to last&lt;br&gt;&lt;br&gt;
For search and reason, a powerful key&lt;br&gt;&lt;br&gt;
Unlocking knowledge for all to see&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Try It Now
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Release: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.07.14" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.07.14&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sourceforge  &lt;a href="https://sourceforge.net/projects/docwire/files/2025.07.14/" rel="noopener noreferrer"&gt;https://sourceforge.net/projects/docwire/files/2025.07.14/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This release brings DocWire one step closer to being the go-to C++ SDK for hybrid AI-powered backends and document pipelines.&lt;/p&gt;

&lt;p&gt;We welcome your feedback, issues, or questions.&lt;/p&gt;

&lt;p&gt;— The DocWire Team&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>opensource</category>
      <category>ai</category>
      <category>devtools</category>
    </item>
    <item>
      <title>We are SourceForge RisingStars</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 09 Jul 2025 07:01:33 +0000</pubDate>
      <link>https://forem.com/novitzmann/we-are-sourceforge-risingstars-518d</link>
      <guid>https://forem.com/novitzmann/we-are-sourceforge-risingstars-518d</guid>
      <description>&lt;p&gt;DocWire SDK has received the Rising Star Award from SourceForge.&lt;/p&gt;

&lt;p&gt;It’s awarded to open-source projects showing strong user traction and momentum — and with over 500,000 projects on SourceForge, that’s something we’re excited about.&lt;/p&gt;

&lt;p&gt;If you haven’t heard of us:&lt;br&gt;
DocWire is a modern C++ SDK for structured and unstructured data extraction, parsing, and transformation. It’s being used in production by consulting firms, AI tools, and secure environments — and we’re just getting started.&lt;/p&gt;

&lt;p&gt;We are looking for:&lt;/p&gt;

&lt;p&gt;Developers who care about modern, clean C++&lt;br&gt;
Contributors who want to join an ambitious open-source project&lt;br&gt;
Partners who need powerful data tooling or want to integrate with us&lt;br&gt;
If that’s you, let’s talk.&lt;/p&gt;

&lt;p&gt;Link to the project: &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;&lt;br&gt;
SourceForge: &lt;a href="https://sourceforge.net/projects/docwire/" rel="noopener noreferrer"&gt;https://sourceforge.net/projects/docwire/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback, issues, and PRs are always welcome.&lt;/p&gt;

&lt;p&gt;Thanks to SourceForge for the support and visibility.&lt;/p&gt;

&lt;p&gt;-The DocWire Team-&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>opensource</category>
      <category>contributorswanted</category>
      <category>startup</category>
    </item>
    <item>
      <title>DocWire SDK 2025.06.29 Released – New GPT-4o Support, Cleaner Builds, Smarter Prompts</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 02 Jul 2025 22:43:15 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-20250629-released-new-gpt-4o-support-cleaner-builds-smarter-prompts-13oh</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-20250629-released-new-gpt-4o-support-cleaner-builds-smarter-prompts-13oh</guid>
      <description>&lt;p&gt;A fresh DocWire SDK update is here. Version &lt;strong&gt;2025.06.29&lt;/strong&gt; modernises our OpenAI integration, streamlines dependencies, and refines prompt engineering for more accurate AI-powered features.&lt;/p&gt;

&lt;p&gt;Full release notes: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.06.29" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.06.29&lt;/a&gt;  &lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1 · Expanded OpenAI Model Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Added new models: &lt;code&gt;gpt-4o&lt;/code&gt;, &lt;code&gt;gpt-4o-mini&lt;/code&gt;, &lt;code&gt;gpt-4.1&lt;/code&gt;, plus the new &lt;code&gt;o3&lt;/code&gt; family.
&lt;/li&gt;
&lt;li&gt;All AI-powered components now default to current-generation models.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2 · Granular Model Selection for Transcription and TTS
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Transcription: choose among &lt;code&gt;gpt-4o-transcribe&lt;/code&gt;, &lt;code&gt;gpt-4o-mini-transcribe&lt;/code&gt;, or &lt;code&gt;whisper-1&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;TTS: new &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt; becomes the default for higher-quality voice synthesis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3 · Dependency Modernisation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Replaced custom &lt;code&gt;unzip&lt;/code&gt; vcpkg port with standard &lt;code&gt;minizip&lt;/code&gt;, simplifying the build and improving maintainability.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Prompts&lt;/strong&gt; – Classify and Find now use stronger system prompts for more precise, consistently formatted results.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updated Default Model&lt;/strong&gt; – General operations default to &lt;code&gt;gpt-4o&lt;/code&gt; for better performance and cost efficiency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robust Example Tests&lt;/strong&gt; – Documentation examples now use fuzzy string matching, avoiding false negatives from minor AI wording changes.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Refactor Highlights
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Clean-up&lt;/strong&gt; – Deprecated OpenAI models (such as &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; and &lt;code&gt;gpt-4-turbo-preview&lt;/code&gt;) have been removed.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transcription Component&lt;/strong&gt; – Refactored to support model selection, keeping the interface future-proof.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo – &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Latest release – &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.06.29" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.06.29&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sourceforce &lt;a href="https://sourceforge.net/projects/docwire/files/2025.06.29/" rel="noopener noreferrer"&gt;https://sourceforge.net/projects/docwire/files/2025.06.29/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We welcome feedback, issues, and PRs.  &lt;/p&gt;

&lt;p&gt;— The DocWire Team&lt;/p&gt;

</description>
      <category>openai</category>
      <category>cpp</category>
      <category>opensource</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>DocWire SDK 2025.06.19 Released – Major OCR &amp; PDF Layout Upgrades, Archive Refactor, CI Improvements</title>
      <dc:creator>Krzysztof Nowicki</dc:creator>
      <pubDate>Wed, 25 Jun 2025 10:52:36 +0000</pubDate>
      <link>https://forem.com/novitzmann/docwire-sdk-20250619-released-major-ocr-pdf-layout-upgrades-archive-refactor-ci-improvements-k85</link>
      <guid>https://forem.com/novitzmann/docwire-sdk-20250619-released-major-ocr-pdf-layout-upgrades-archive-refactor-ci-improvements-k85</guid>
      <description>&lt;p&gt;We’re back with a substantial update to &lt;strong&gt;DocWire SDK&lt;/strong&gt;, the modern C++ library for structured document parsing, data extraction and secure, high-performance back-end workflows.&lt;br&gt;&lt;br&gt;
Version &lt;strong&gt;2025.06.19&lt;/strong&gt; focuses on sharper OCR, more faithful PDF layout reconstruction and a brand-new archive module, alongside testing and CI upgrades.&lt;/p&gt;

&lt;p&gt;Full release notes: &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.06.19" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.06.19&lt;/a&gt;  &lt;/p&gt;

&lt;h2&gt;
  
  
  What’s New
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1 · OCR Enhancements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured output with positional metadata&lt;/strong&gt; – &lt;code&gt;OCRParser&lt;/code&gt; now returns x, y, width, height plus line, paragraph and section grouping.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configurable confidence filter&lt;/strong&gt; (0–100) to ignore low-confidence words.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2 · Higher-Fidelity PDF Parsing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Refactored &lt;code&gt;PDFParser&lt;/code&gt; to sort elements by position, yielding more accurate text flow and layout reconstruction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3 · Modern Archive Handling
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;New &lt;strong&gt;&lt;code&gt;docwire_archives&lt;/code&gt;&lt;/strong&gt; library for modular, maintainable and faster archive processing.
&lt;/li&gt;
&lt;li&gt;Archive detection is now MIME-based.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4 · Expanded Format Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatic detection for &lt;strong&gt;ASP&lt;/strong&gt; and &lt;strong&gt;ASP.NET&lt;/strong&gt; documents.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Developer-Centric Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Plain-text exporter&lt;/strong&gt; handles page breaks more clearly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI pipeline&lt;/strong&gt; moves to &lt;code&gt;windows-2025&lt;/code&gt; runners; ASAN re-enabled on Windows.
&lt;/li&gt;
&lt;li&gt;Broader automated test coverage (OCR, HTTP, CLI).
&lt;/li&gt;
&lt;li&gt;Build fix on Windows via &lt;code&gt;NOMINMAX&lt;/code&gt; flag to resolve &lt;code&gt;windows.h&lt;/code&gt; / PDFium conflicts.
&lt;/li&gt;
&lt;li&gt;Spacing and line-break corrections in PDF and OCR outputs.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Documentation
&lt;/h2&gt;

&lt;p&gt;API docs and module dependency notes are fully up to date.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;OCR’s vision, sharp and newly bright&lt;br&gt;&lt;br&gt;
PDF layouts, now a clearer sight&lt;br&gt;&lt;br&gt;
Archives rebuilt, with structure firm and new&lt;br&gt;&lt;br&gt;
DocWire advances, steady, strong, and true&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Try It Now
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo – &lt;a href="https://github.com/docwire/docwire" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Latest release – &lt;a href="https://github.com/docwire/docwire/releases/tag/2025.06.19" rel="noopener noreferrer"&gt;https://github.com/docwire/docwire/releases/tag/2025.06.19&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sourceforge - &lt;a href="https://sourceforge.net/projects/docwire/files/2025.06.19/" rel="noopener noreferrer"&gt;https://sourceforge.net/projects/docwire/files/2025.06.19/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We welcome feedback, issues and PRs.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Next up:&lt;/strong&gt;  deeper LLM integration and VCPKG support.&lt;/p&gt;

&lt;p&gt;— The DocWire Team&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>opensource</category>
      <category>news</category>
      <category>dataprocessing</category>
    </item>
  </channel>
</rss>
