<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Paul Fresquet</title>
    <description>The latest articles on Forem by Paul Fresquet (@pfresquet).</description>
    <link>https://forem.com/pfresquet</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3622226%2F9d7458d8-02b1-4fac-ad0f-aab3253e95e6.jpg</url>
      <title>Forem: Paul Fresquet</title>
      <link>https://forem.com/pfresquet</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pfresquet"/>
    <language>en</language>
    <item>
      <title>How I built a hybrid LAN/WAN file sync engine without VPN (and why on-demand sync still matters)</title>
      <dc:creator>Paul Fresquet</dc:creator>
      <pubDate>Fri, 21 Nov 2025 14:56:49 +0000</pubDate>
      <link>https://forem.com/pfresquet/how-i-built-a-hybrid-lanwan-file-sync-engine-without-vpn-and-why-on-demand-sync-still-matters-4iib</link>
      <guid>https://forem.com/pfresquet/how-i-built-a-hybrid-lanwan-file-sync-engine-without-vpn-and-why-on-demand-sync-still-matters-4iib</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;🎥 Video demo:&lt;/strong&gt;  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/QuIkdnhAtjY"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A few years ago, I was working on a system that required synchronizing &lt;strong&gt;very large datasets&lt;/strong&gt; — sometimes close to &lt;strong&gt;1 TB&lt;/strong&gt; — across several servers belonging to different companies.&lt;/p&gt;

&lt;p&gt;Some servers were in the same building, others were remote, some were behind locked-down firewalls, and in many cases I had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;no VPN&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;no direct link&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;no control over the remote infra&lt;/strong&gt;,
&lt;/li&gt;
&lt;li&gt;and machines that didn’t even know each other existed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To move initial datasets, I relied on traditional transfer tools.&lt;br&gt;&lt;br&gt;
But the real problem appeared after that first copy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do you &lt;em&gt;verify&lt;/em&gt; that datasets across multiple locations are fully identical, and resynchronize only the missing deltas — especially after an interrupted or incomplete transfer?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Double-checking terabytes manually wasn’t an option.&lt;br&gt;&lt;br&gt;
Running massive checksums remotely was slow and error-prone.&lt;br&gt;&lt;br&gt;
And multi-endpoint scenarios (A ↔ B ↔ C) made it exponentially worse.&lt;/p&gt;

&lt;p&gt;This pain eventually led me to prototype a custom sync engine…&lt;br&gt;&lt;br&gt;
and that prototype turned into &lt;strong&gt;ByteSync&lt;/strong&gt;, an open-source, on-demand file synchronization tool.&lt;/p&gt;

&lt;p&gt;This article is the story of that journey — the architecture, the challenges, the strange bugs, and the “aha” moments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why on-demand sync still matters
&lt;/h2&gt;

&lt;p&gt;Continuous sync tools are amazing — Syncthing is a work of art.&lt;/p&gt;

&lt;p&gt;But continuous sync wasn’t compatible with the environments I worked in.&lt;/p&gt;

&lt;p&gt;When you synchronize across companies or infrastructures you don't manage, you often have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;strict maintenance windows&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;servers that are &lt;strong&gt;offline&lt;/strong&gt; most of the time
&lt;/li&gt;
&lt;li&gt;compliance rules against background daemons
&lt;/li&gt;
&lt;li&gt;sensitive data that must move only at specific times
&lt;/li&gt;
&lt;li&gt;endpoints that can’t stay permanently connected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So syncing needed to happen &lt;strong&gt;only when everyone explicitly agreed on the time slot&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;On-demand sync wasn’t a preference — it was a requirement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It let me run comparisons, verify integrity, and apply deltas exactly when it was permitted.&lt;/p&gt;

&lt;p&gt;This shaped almost every architectural decision that came later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenge #1 — Picking a delta algorithm that works everywhere
&lt;/h2&gt;

&lt;p&gt;I wanted block-level deltas.&lt;br&gt;&lt;br&gt;
Full file transfers would kill the purpose of multi-site sync.&lt;/p&gt;

&lt;p&gt;Naturally, rsync came to mind.&lt;/p&gt;

&lt;p&gt;Then I discovered &lt;strong&gt;FastRSyncNet&lt;/strong&gt; — a .NET port inspired by rsync’s signature and delta algorithm.&lt;br&gt;&lt;br&gt;
It gave me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rolling checksums
&lt;/li&gt;
&lt;li&gt;block signatures
&lt;/li&gt;
&lt;li&gt;efficient delta construction
&lt;/li&gt;
&lt;li&gt;rsync-like behaviour, but portable inside a modern .NET app
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ByteSync became technically very close to rsync’s internal diffing engine, with higher-level orchestration on top.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsncqwz290ubtva1gkwa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsncqwz290ubtva1gkwa.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenge #2 — Merging LAN and WAN connections into a single model
&lt;/h2&gt;

&lt;p&gt;This was the hardest architectural problem.&lt;/p&gt;

&lt;p&gt;I had prior experience with &lt;strong&gt;SignalR&lt;/strong&gt;, so I used it for the realtime communication layer.&lt;br&gt;&lt;br&gt;
On top of that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Functions
&lt;/li&gt;
&lt;li&gt;Azure Redis
&lt;/li&gt;
&lt;li&gt;Azure Blob Storage
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…initially formed the relay for remote sync operations.&lt;/p&gt;

&lt;p&gt;But ByteSync originally had &lt;strong&gt;two separate modes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local mode (LAN only)
&lt;/li&gt;
&lt;li&gt;Cloud mode (WAN only)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I couldn’t find a clean way to bridge them.&lt;br&gt;&lt;br&gt;
Users had to pick a mode upfront, which didn’t match real-world workflows.&lt;/p&gt;

&lt;p&gt;The breakthrough came with the concept of &lt;strong&gt;DataNodes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A DataNode abstracts a sync participant — local or remote — so the orchestration doesn’t care about the distance between nodes.&lt;/p&gt;

&lt;p&gt;This allowed:&lt;/p&gt;

&lt;p&gt;✔ direct LAN connections when devices can see each other&lt;br&gt;&lt;br&gt;
✔ encrypted relayed connections when they can’t&lt;br&gt;&lt;br&gt;
✔ both in the &lt;em&gt;same&lt;/em&gt; sync session  &lt;/p&gt;

&lt;p&gt;Suddenly, we had hybrid sessions.&lt;/p&gt;

&lt;p&gt;And it changed everything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi7prarqicyq7ps8xp4dy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi7prarqicyq7ps8xp4dy.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenge #3 — Azure Blob Storage and the “egress bill from hell”
&lt;/h2&gt;

&lt;p&gt;Originally, remote exchanges used Azure Blob Storage.&lt;/p&gt;

&lt;p&gt;It worked.&lt;br&gt;&lt;br&gt;
But then I ran the cost estimate.&lt;/p&gt;

&lt;p&gt;And… no.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Azure egress fees were far too high for multi-site sync.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not just high — &lt;strong&gt;non-viable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That pushed me to migrate the relay layer to &lt;strong&gt;Cloudflare R2&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no egress fees
&lt;/li&gt;
&lt;li&gt;great performance
&lt;/li&gt;
&lt;li&gt;straightforward API
&lt;/li&gt;
&lt;li&gt;predictable costs
&lt;/li&gt;
&lt;li&gt;perfect for temporary encrypted blobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Switching to R2 turned out to be one of the best decisions of the project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenge #4 — The first fully working remote sync (the “aha” moment)
&lt;/h2&gt;

&lt;p&gt;I remember the first time a full remote sync completed successfully.&lt;br&gt;&lt;br&gt;
It wasn’t just correct — it was &lt;em&gt;fast&lt;/em&gt;, considering the conditions.&lt;/p&gt;

&lt;p&gt;Encrypted cloud relay.&lt;br&gt;&lt;br&gt;
Rolling checksums.&lt;br&gt;&lt;br&gt;
Delta blocks.&lt;br&gt;&lt;br&gt;
Multi-endpoint convergence.&lt;/p&gt;

&lt;p&gt;Everything clicked that day.&lt;/p&gt;

&lt;p&gt;That was my “OK yes — this is worth continuing” moment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5ikumb75920j8350lbn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5ikumb75920j8350lbn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenge #5 — The strangest bug I’ve ever hit
&lt;/h2&gt;

&lt;p&gt;This one cost me three days of my life.&lt;/p&gt;

&lt;p&gt;Every time I added a file or folder on my machine…&lt;br&gt;&lt;br&gt;
&lt;strong&gt;the connection dropped.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every.&lt;br&gt;&lt;br&gt;
Single.&lt;br&gt;&lt;br&gt;
Time.&lt;/p&gt;

&lt;p&gt;I debugged:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SignalR
&lt;/li&gt;
&lt;li&gt;Azure Functions
&lt;/li&gt;
&lt;li&gt;caching
&lt;/li&gt;
&lt;li&gt;threading
&lt;/li&gt;
&lt;li&gt;reconnection logic
&lt;/li&gt;
&lt;li&gt;manifests
&lt;/li&gt;
&lt;li&gt;cancellation tokens
&lt;/li&gt;
&lt;li&gt;TCP vs WebSockets
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing made sense.&lt;/p&gt;

&lt;p&gt;The culprit?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Opening Windows Explorer caused a 20–30 second network hang… but only on my machine.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not on the servers.&lt;br&gt;&lt;br&gt;
Not in production.&lt;br&gt;&lt;br&gt;
Just… my Windows environment being haunted.&lt;/p&gt;

&lt;p&gt;Once I understood it, everything made sense — and nothing made sense at the same time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenge #6 — Validating real-world use cases
&lt;/h2&gt;

&lt;p&gt;The architecture later proved itself in scenarios like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;multi-site synchronization&lt;/strong&gt; across organizations
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;multi-folder split comparisons&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;integrity verification&lt;/strong&gt; after partial transfers
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;deduplication across several endpoints&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;syncing nodes that were sometimes LAN, sometimes WAN, sometimes both
&lt;/li&gt;
&lt;li&gt;combining several independent datasets into a unified comparison&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The more complex the scenario, the more the architecture made sense.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rb5k256cl0i8br6e6dw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rb5k256cl0i8br6e6dw.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I didn’t initially plan to build a synchronization tool.&lt;br&gt;&lt;br&gt;
I just needed a way to reliably synchronize large datasets across machines that couldn’t talk to each other.&lt;/p&gt;

&lt;p&gt;But challenge after challenge, the project grew into something more robust and more general than I expected.&lt;/p&gt;

&lt;p&gt;This article isn’t meant as a product pitch — just an honest breakdown of the problems I faced and how I solved them.&lt;/p&gt;

&lt;p&gt;If you're curious about the tool behind these experiments, ByteSync is open-source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/POW-Software/ByteSync" rel="noopener noreferrer"&gt;https://github.com/POW-Software/ByteSync&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Website: &lt;a href="https://www.bytesyncapp.com" rel="noopener noreferrer"&gt;https://www.bytesyncapp.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback is always welcome.&lt;/p&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>datasynchronization</category>
      <category>security</category>
      <category>cloud</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
