<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: AJAYA SHRESTHA</title>
    <description>The latest articles on Forem by AJAYA SHRESTHA (@azayshrestha).</description>
    <link>https://forem.com/azayshrestha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1480003%2Fa7e8fb8f-86e9-4962-a3c1-8d1b4b855de6.jpg</url>
      <title>Forem: AJAYA SHRESTHA</title>
      <link>https://forem.com/azayshrestha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/azayshrestha"/>
    <language>en</language>
    <item>
      <title>I Built a Privacy-First Image Workflow App That Runs Entirely in the Browser</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Wed, 11 Mar 2026 17:55:43 +0000</pubDate>
      <link>https://forem.com/azayshrestha/i-built-a-privacy-first-image-workflow-app-that-runs-entirely-in-the-browser-4e2g</link>
      <guid>https://forem.com/azayshrestha/i-built-a-privacy-first-image-workflow-app-that-runs-entirely-in-the-browser-4e2g</guid>
      <description>&lt;p&gt;When people work with images online, they usually do the same few things again and again.&lt;/p&gt;

&lt;p&gt;They compress images to reduce file size.&lt;br&gt;
They convert one format to another.&lt;br&gt;
They remove metadata.&lt;br&gt;
They crop, resize, or add a watermark before sharing.&lt;/p&gt;

&lt;p&gt;The problem is that many tools only do one of these jobs. And a lot of them ask users to upload their files to a server first.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That never felt ideal to me.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Images can be personal. They can contain private information. They can be client files, screenshots, IDs, product images, or internal work. Uploading them just to make a quick edit or reduce file size adds an extra step, and for many people, it also adds trust concerns.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;So I built &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; is a privacy-first image workflow app that runs entirely in the browser. That means users can compress, convert, crop, resize, remove metadata, and watermark images without sending those files to a remote server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I built it
&lt;/h3&gt;

&lt;p&gt;I wanted something simple.&lt;/p&gt;

&lt;p&gt;Not a heavy editor.&lt;br&gt;
Not a complicated dashboard.&lt;br&gt;
Not a tool that makes people upload files for every small change.&lt;/p&gt;

&lt;p&gt;I wanted one place where someone could take an image and quickly get it ready for use.&lt;/p&gt;

&lt;p&gt;Sometimes that means making the file smaller for a website.&lt;br&gt;
Sometimes it means converting HEIC to JPG.&lt;br&gt;
Sometimes it means removing EXIF data before sharing.&lt;br&gt;
Sometimes it means resizing an image for a blog post or adding a watermark before posting it online.&lt;/p&gt;

&lt;p&gt;These are common tasks, but the workflow is often messy. People jump between different tools just to finish one job.&lt;/p&gt;

&lt;p&gt;I thought that could be better.&lt;/p&gt;

&lt;h3&gt;
  
  
  What &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; does
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; is not just an image compressor.&lt;br&gt;
It is built to help with the full image workflow in a simple way.&lt;br&gt;
With it, users can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compress images&lt;/li&gt;
&lt;li&gt;convert image formats&lt;/li&gt;
&lt;li&gt;handle HEIC files more easily&lt;/li&gt;
&lt;li&gt;remove EXIF and metadata&lt;/li&gt;
&lt;li&gt;crop and resize images&lt;/li&gt;
&lt;li&gt;add watermarks&lt;/li&gt;
&lt;li&gt;process files directly in the browser&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to make image preparation fast, private, and easy to understand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why privacy matters here
&lt;/h3&gt;

&lt;p&gt;A lot of people do not think about privacy when using image tools.&lt;br&gt;
But images often carry more than just pixels.&lt;/p&gt;

&lt;p&gt;They can contain metadata like device details, timestamps, and location information. They can also include sensitive visual content that users may not want to upload anywhere.&lt;/p&gt;

&lt;p&gt;For many simple image tasks, there is no real reason to send files away to a server.&lt;/p&gt;

&lt;p&gt;That is why I wanted &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; to work locally in the browser. Users should be able to work with their files on their own device and stay in control of that process.&lt;/p&gt;

&lt;p&gt;For me, privacy is not just a feature. It is part of the product idea.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I chose the browser
&lt;/h3&gt;

&lt;p&gt;The browser is much more powerful than many people think.&lt;/p&gt;

&lt;p&gt;Today, it is possible to build fast tools that feel almost like desktop apps. For image workflows, that opens up a lot of interesting possibilities.&lt;/p&gt;

&lt;p&gt;Running directly in the browser has a few big advantages:&lt;/p&gt;

&lt;p&gt;First, it makes the experience faster for many tasks. Users do not need to wait for uploads before they get started.&lt;/p&gt;

&lt;p&gt;Second, it keeps the workflow simple. Open the site, drop the file, and use the tool.&lt;/p&gt;

&lt;p&gt;Third, it supports the privacy-first idea. The work happens on the user’s device instead of depending on a remote image-processing pipeline.&lt;/p&gt;

&lt;p&gt;I really liked that direction.&lt;/p&gt;

&lt;h3&gt;
  
  
  The kind of users I had in mind
&lt;/h3&gt;

&lt;p&gt;While building &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt;, I kept thinking about real everyday users.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers who want lighter assets for websites.&lt;/li&gt;
&lt;li&gt;Bloggers who need properly sized images.&lt;/li&gt;
&lt;li&gt;Designers who want to clean up files quickly.&lt;/li&gt;
&lt;li&gt;Marketers preparing visuals for campaigns.&lt;/li&gt;
&lt;li&gt;Store owners resizing and watermarking product images.&lt;/li&gt;
&lt;li&gt;Anyone who wants a quick tool without a complicated workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The product is simple on purpose. I wanted it to be useful even for people who are not technical.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I learned while building it
&lt;/h3&gt;

&lt;p&gt;One thing I learned is that people do not want ten different image tools.&lt;/p&gt;

&lt;p&gt;They want one tool that helps them finish the job.&lt;/p&gt;

&lt;p&gt;Another thing I learned is that privacy itself can be part of the value. People appreciate convenience, but they also appreciate knowing their files stay with them.&lt;/p&gt;

&lt;p&gt;And maybe the biggest lesson is this: simple tools are harder to build than they look.&lt;/p&gt;

&lt;p&gt;Making something feel easy takes a lot of thought. You have to decide what to include, what to leave out, and how to keep the experience clean without making it limited.&lt;/p&gt;

&lt;p&gt;That balance mattered a lot while building &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I want &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; to be
&lt;/h3&gt;

&lt;p&gt;I want &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; to be the kind of tool people open when they need to do something with an image quickly and move on.&lt;/p&gt;

&lt;p&gt;No confusion.&lt;br&gt;
No unnecessary steps.&lt;br&gt;
No heavy setup.&lt;br&gt;
No upload-first workflow for basic tasks.&lt;/p&gt;

&lt;p&gt;Just a simple image workflow app that respects the user’s time and privacy.&lt;/p&gt;

&lt;p&gt;There are many image tools online already, so I did not build &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; just to make “another tool.”&lt;/p&gt;

&lt;p&gt;I built it because I wanted a more practical and private way to handle everyday image tasks.&lt;/p&gt;

&lt;p&gt;If you have ever needed to compress, convert, crop, resize, clean, or protect an image and thought the process should be simpler, that is exactly the problem I wanted to solve.&lt;/p&gt;

&lt;p&gt;If you check it out, I’d genuinely love to know what feels useful, what feels missing, and what could be improved.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Developers: Your Image Optimizer Might Be Logging Your Assets</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Mon, 02 Mar 2026 02:53:19 +0000</pubDate>
      <link>https://forem.com/azayshrestha/developers-your-image-optimizer-might-be-logging-your-assets-26cg</link>
      <guid>https://forem.com/azayshrestha/developers-your-image-optimizer-might-be-logging-your-assets-26cg</guid>
      <description>&lt;p&gt;If you build for the web, you probably compress images almost every day.&lt;br&gt;
Screenshots. UI mockups. Marketing banners. Product photos. Internal dashboards.&lt;br&gt;
You drag. You drop. You download the smaller file. Done.&lt;br&gt;
But here’s the uncomfortable question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where did that image go while it was being compressed?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Part Most Developers Don’t Think About
&lt;/h3&gt;

&lt;p&gt;Most “free” online image compressors work like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You upload your image to their server&lt;/li&gt;
&lt;li&gt;Their backend processes it&lt;/li&gt;
&lt;li&gt;They send the optimized version back&lt;/li&gt;
&lt;li&gt;Your file may stay on their infrastructure (temporarily or longer)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, to be clear, not every service is doing something malicious. &lt;br&gt;
Many are reputable and transparent.&lt;/p&gt;

&lt;p&gt;But technically speaking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your file leaves your machine&lt;/li&gt;
&lt;li&gt;It touches someone else’s server&lt;/li&gt;
&lt;li&gt;It may be logged, cached, or stored&lt;/li&gt;
&lt;li&gt;You usually don’t see what happens behind the scenes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're compressing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client assets&lt;/li&gt;
&lt;li&gt;NDA-bound materials&lt;/li&gt;
&lt;li&gt;Internal dashboards&lt;/li&gt;
&lt;li&gt;Pre-launch product screenshots
That should make you pause.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why This Actually Matters
&lt;/h3&gt;

&lt;p&gt;As developers, we’re careful about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API keys&lt;/li&gt;
&lt;li&gt;Environment variables&lt;/li&gt;
&lt;li&gt;Production databases&lt;/li&gt;
&lt;li&gt;Auth tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But we casually upload images that might contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proprietary UI&lt;/li&gt;
&lt;li&gt;Customer data&lt;/li&gt;
&lt;li&gt;Financial dashboards&lt;/li&gt;
&lt;li&gt;Unreleased features
It’s inconsistent.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;We protect code, but not always assets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Better Approach: Client-Side Compression
&lt;/h3&gt;

&lt;p&gt;There’s a safer alternative: compress images directly in the browser.&lt;br&gt;
Instead of this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Browser → Remote Server → Back to Browser&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You get this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Browser → Process → Download&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The file never leaves your device.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No server upload.&lt;/li&gt;
&lt;li&gt;No storage risk.&lt;/li&gt;
&lt;li&gt;No backend logging.
Just local processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Client-Side Processing Is Different
&lt;/h3&gt;

&lt;p&gt;Modern browsers are powerful. With JavaScript and WebAssembly, image compression can run entirely on your machine.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No data transfer to external servers&lt;/li&gt;
&lt;li&gt;No retention policy concerns&lt;/li&gt;
&lt;li&gt;No compliance ambiguity&lt;/li&gt;
&lt;li&gt;No “we delete files after 24 hours” disclaimers
It’s simple:
If it never leaves your device, it can’t be stored elsewhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  This Is Why &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; Exists
&lt;/h3&gt;

&lt;p&gt;I built &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt; with one core idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Your images should stay yours.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;ZeroPNG&lt;/a&gt; compresses PNG images directly in the browser.&lt;br&gt;
Nothing gets uploaded. Nothing gets saved remotely.&lt;br&gt;
It’s fast. It’s simple. And it doesn’t require you to trust a server.&lt;br&gt;
Because sometimes the best privacy policy is architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should You Stop Using Other Tools?
&lt;/h3&gt;

&lt;p&gt;Not necessarily. But you should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check whether your current tool uploads files&lt;/li&gt;
&lt;li&gt;Read their retention policy&lt;/li&gt;
&lt;li&gt;Understand where your assets go
As developers, we talk a lot about privacy, security, and ownership.
Image optimization shouldn’t be the blind spot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next time you drag an image into an online compressor, ask yourself:&lt;br&gt;
&lt;strong&gt;Would I upload my production database to a random server just because it’s “free”?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the answer is &lt;strong&gt;No&lt;/strong&gt; then maybe your images deserve the same caution.&lt;/p&gt;

&lt;p&gt;Try &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt;, and suggest me How can i make it better.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your Image Compressor Has Seen Every Photo You've Ever "Compressed for Free"</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Fri, 27 Feb 2026 18:09:34 +0000</pubDate>
      <link>https://forem.com/azayshrestha/your-image-compressor-has-seen-every-photo-youve-ever-compressed-for-free-14m6</link>
      <guid>https://forem.com/azayshrestha/your-image-compressor-has-seen-every-photo-youve-ever-compressed-for-free-14m6</guid>
      <description>&lt;p&gt;You've done it hundreds of times without thinking about it.&lt;br&gt;
Your photo is too large to email. Your website is loading slowly because the images are too big. Your client needs the file under a certain size. So you open a browser tab, type "free image compressor," drag your photo in, and get a smaller version back in seconds.&lt;br&gt;
Simple. Free. Done.&lt;br&gt;
Except there's one part of that transaction you probably never noticed.&lt;br&gt;
Your photo left your computer.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Actually Happens When You "Compress" a Photo Online
&lt;/h3&gt;

&lt;p&gt;When you drag an image into TinyPNG, Compress.io, or most other free online tools, here's the real sequence of events:&lt;br&gt;
Your photo travels across the internet to a server somewhere. That server, owned by a company you've probably never heard of, running software you can't inspect, processes your image. Then it sends the smaller version back to you.&lt;br&gt;
The whole thing takes two or three seconds. It feels instant. It feels local. It feels like the tool is just doing something clever on your screen.&lt;br&gt;
It isn't. Your photo made a round trip to a datacenter and back.&lt;br&gt;
For a photo of your lunch, that's probably fine.&lt;br&gt;
But think for a moment about what you've compressed over the years.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Photos You Forgot You Uploaded
&lt;/h3&gt;

&lt;p&gt;Client work you were under NDA not to share. Passport scans. Photos of your home, your car, your children. Screenshots that happened to contain your email, your account number, your address. Medical images. Legal documents you photographed on your phone. Confidential presentations. Unreleased product designs.&lt;br&gt;
Every one of those went to someone else's server before it came back to you.&lt;br&gt;
Most of the time, nothing bad happens. These companies aren't villains. But three things are true simultaneously:&lt;br&gt;
&lt;strong&gt;You didn't know it was happening.&lt;/strong&gt; The tools don't say "your file will now travel to our servers." They just do it.&lt;br&gt;
&lt;strong&gt;You agreed to it.&lt;/strong&gt; Buried in the terms of service, the ones nobody reads, is language describing exactly this. You consented without knowing you consented.&lt;br&gt;
&lt;strong&gt;You had no alternative.&lt;/strong&gt; Until recently, there genuinely wasn't another way. Compressing an image required a server to do the heavy lifting. Your browser wasn't capable.&lt;br&gt;
That last part changed. Quietly, without announcement, browsers became powerful enough to handle image compression entirely on their own.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tool That Stays Silent
&lt;/h3&gt;

&lt;p&gt;I built &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt; because I needed a compressor I could use on client files without worrying.&lt;br&gt;
The experience looks identical to TinyPNG. You drag photos in. You get smaller photos back. There's a quality slider, format options, a download button.&lt;br&gt;
The difference is invisible unless you know where to look.&lt;br&gt;
Open the browser's developer tools. Go to the Network tab, the section that shows everything your browser sends and receives over the internet. Compress a photo on &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt;. Watch the Network tab.&lt;br&gt;
Nothing moves.&lt;br&gt;
No upload. No server request. No data leaving your machine. The compression happens entirely inside your browser tab, using technology that's been quietly built into every modern browser for years. Your photo goes in, a smaller photo comes out, and the whole process happens in the same place you're sitting.&lt;br&gt;
You can test this yourself in thirty seconds. That silence is the entire point.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who Actually Needs This
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Freelancers and designers&lt;/strong&gt; who work under NDAs. When a client says "don't share our unreleased work," they mean it, including with the servers behind your compression tool.&lt;br&gt;
&lt;strong&gt;Small business owners&lt;/strong&gt; who photograph products, receipts, documents. These files contain more sensitive information than most people realize.&lt;br&gt;
&lt;strong&gt;Anyone in healthcare.&lt;/strong&gt; Patient photos, scan images, medical documentation, these have legal protections that most free online tools don't comply with. A tool that never receives your files can't violate those protections.&lt;br&gt;
&lt;strong&gt;Parents&lt;/strong&gt; who share photos of their children. Location data is embedded in smartphone photos by default. Most people don't know this. That data survives compression unless the tool explicitly removes it, which &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt; does automatically, because re-encoding through the browser strips the original metadata.&lt;br&gt;
&lt;strong&gt;Anyone who's ever thought "I probably shouldn't run this through an online tool"&lt;/strong&gt;, and then done it anyway because there was no other option.&lt;br&gt;
Now there is.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Question Worth Asking About Every "Free" Tool
&lt;/h3&gt;

&lt;p&gt;Nothing is actually free. When a tool costs you nothing, the question worth asking is: what is the business model?&lt;br&gt;
For image compressors, the answer has historically been: volume, data, and advertising. They need your files to pass through their servers to show you ads around the experience, gather analytics, and in some cases use uploaded content to improve their own AI models again, usually disclosed somewhere in the terms, and almost never noticed.&lt;br&gt;
A tool that never receives your files has none of those revenue streams. Which means it has to find a different model or, in the case of &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt;, simply be free because it costs almost nothing to run. There's no server to maintain. No storage. No bandwidth bill for processing millions of images. Hosting a single HTML file on Cloudflare costs essentially zero.&lt;br&gt;
The privacy isn't an added feature. It's a consequence of the architecture. The tool can't collect your data because it never touches your data.&lt;/p&gt;

&lt;h3&gt;
  
  
  It Also Works Without the Internet
&lt;/h3&gt;

&lt;p&gt;This is the part that surprises people most.&lt;br&gt;
After the page loads, &lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt; works completely offline. You can open it, disconnect your WiFi, and compress photos. Everything runs inside the browser tab. There's nothing to connect to.&lt;br&gt;
This makes it useful in places and situations where you might not have assumed a web tool would work, on a plane, in a location with unreliable connection, on a device with restricted network access.&lt;br&gt;
The page loads once. After that, it's yours.&lt;/p&gt;

&lt;h3&gt;
  
  
  One Habit Worth Changing
&lt;/h3&gt;

&lt;p&gt;Next time you're about to drag a file into an online tool, any online tool, not just image compressors, pause for three seconds and ask: does this file need to leave my computer to get this done?&lt;br&gt;
For most things, the honest answer is no. Browser technology in 2025 is quietly capable of things that used to require servers. PDF processing, format conversion, document editing, video trimming, tools that run locally are increasingly available for all of these, built by people who got frustrated with the same problem.&lt;br&gt;
For images, the answer has been no for a while. The tool just hadn't been built with a decent interface.&lt;br&gt;
Now it has.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zeropng.com/" rel="noopener noreferrer"&gt;zeropng.com&lt;/a&gt; - free, no account, works offline, and the network tab stays silent.&lt;br&gt;
Your photos stay on your computer. That's the whole idea.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>productivity</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>IVFFlat Indexing in pgvector</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Wed, 17 Dec 2025 03:36:27 +0000</pubDate>
      <link>https://forem.com/azayshrestha/ivfflat-indexing-in-pgvector-2cj0</link>
      <guid>https://forem.com/azayshrestha/ivfflat-indexing-in-pgvector-2cj0</guid>
      <description>&lt;p&gt;&lt;strong&gt;Vector databases&lt;/strong&gt; and &lt;strong&gt;AI-powered&lt;/strong&gt; applications continue to grow rapidly, and &lt;strong&gt;PostgreSQL&lt;/strong&gt; has joined the movement with pgvector, a powerful extension that adds vector similarity search directly to Postgres. With the release of pgvector 0.5+, one of the most widely used indexing strategies is &lt;strong&gt;IVFFlat&lt;/strong&gt;, an approximate nearest neighbor (ANN) index that dramatically speeds up similarity queries on large vector datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is IVFFlat in pgvector?
&lt;/h3&gt;

&lt;p&gt;IVFFlat (Inverted File with Flat Vectors) is an Approximate Nearest Neighbor (ANN) index. Unlike a brute-force scan, which compares a query vector against every vector in the table, IVFFlat partitions vectors into multiple “lists” (or clusters). During a query, only the most relevant lists are searched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster similarity search on large datasets&lt;/li&gt;
&lt;li&gt;Approximate, but accuracy is tunable&lt;/li&gt;
&lt;li&gt;Great fit for high-dimensional embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How IVFFlat Works?
&lt;/h3&gt;

&lt;p&gt;IVFFlat uses a centroid-based clustering approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Training Step:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vectors are clustered into lists using k-means.&lt;/li&gt;
&lt;li&gt;Each list represents a centroid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Index Structure&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each vector is assigned to the closest centroid/list.&lt;/li&gt;
&lt;li&gt;The index stores lists of vectors (inverted lists).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Query Execution&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query vector is compared to all centroids.&lt;/li&gt;
&lt;li&gt;The probes most similar lists are selected.&lt;/li&gt;
&lt;li&gt;Only vectors within those lists are compared.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What do we control?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;lists&lt;/strong&gt; - number of clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;probes&lt;/strong&gt; - number of clusters searched during query
Increasing the number of probes increases accuracy but reduces speed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementing IVFFlat Indexing in pgvector
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Install pgvector (if not already installed)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE EXTENSION IF NOT EXISTS vector;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Create a table with vector embeddings&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE documents (
    id bigserial PRIMARY KEY,
    embedding vector(768)
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Create IVFFlat Index&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX vector_ivfflat_idx
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The index must be created after inserting enough rows (for better k-means training).&lt;/li&gt;
&lt;li&gt;Try to have at least 1,000 rows per list.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Querying with IVFFlat&lt;/strong&gt;&lt;br&gt;
Example cosine similarity search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET ivfflat.probes = 20;

SELECT id
FROM documents
ORDER BY embedding &amp;lt;-&amp;gt; '[0.5, 0.3, …]'
LIMIT 10;

# Set globally:
ALTER SYSTEM SET ivfflat.probes = 20;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tuning probes in IVFFlat&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Probes control the number of IVF lists scanned during a query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower probes&lt;/strong&gt; - faster but less accurate, because fewer clusters are searched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher probes&lt;/strong&gt; - better accuracy, but more clusters scanned means slower performance.&lt;/li&gt;
&lt;li&gt;Choosing the optimal value depends on how much you prioritize speed vs recall.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommended Ranges
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Low probes (1–10)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✔ Fastest search performance&lt;/li&gt;
&lt;li&gt;✔ Best for real-time or high-throughput workloads&lt;/li&gt;
&lt;li&gt;✘ Lower accuracy and recall&lt;/li&gt;
&lt;li&gt;✘ Might miss similar vectors if clusters are coarse&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Medium probes (~10% of total lists)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✔ Balanced between speed and accuracy&lt;/li&gt;
&lt;li&gt;✔ Suitable for most production workloads&lt;/li&gt;
&lt;li&gt;✔ Good recall without major performance sacrifice&lt;/li&gt;
&lt;li&gt;✘ Slightly slower than low probe settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;High probes (50–100% of lists)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✔ Near-exact search results (high recall)&lt;/li&gt;
&lt;li&gt;✔ Good for quality-sensitive workloads (e.g., search relevance)&lt;/li&gt;
&lt;li&gt;✘ Much slower due to scanning many lists&lt;/li&gt;
&lt;li&gt;✘ Reduces the performance benefit of ANN indexing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Maintenance Tasks: REINDEX, ANALYZE, VACUUM
&lt;/h3&gt;

&lt;p&gt;IVFFlat indexes must be maintained correctly to keep search performance stable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. ANALYZE: Improve Query Planning&lt;/strong&gt;&lt;br&gt;
PostgreSQL needs fresh statistics to choose the best plan. Run ANALYZE after large batches of insertions or schedule autovacuum/analyze.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ANALYZE documents;

# Check the last ANALYZE time for your table
SELECT relname, last_analyze, last_autoanalyze
FROM pg_stat_all_tables
WHERE relname = 'documents';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. REINDEX: Required After Massive Data Changes&lt;/strong&gt;&lt;br&gt;
If many vectors are inserted or deleted, list centroids can drift and degrade performance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REINDEX INDEX vector_ivfflat_idx;

# If you want to keep the table live during rebuilds
REINDEX INDEX CONCURRENTLY vector_ivfflat_idx;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to REINDEX:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After inserting millions of new rows&lt;/li&gt;
&lt;li&gt;After deleting a large portion of data&lt;/li&gt;
&lt;li&gt;If recall noticeably decreases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. VACUUM: Keep Storage Clean&lt;/strong&gt;&lt;br&gt;
Vector columns don’t produce unusual bloat, but regular VACUUM helps maintain table and index health.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;VACUUM (VERBOSE, ANALYZE) documents;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Enable autovacuum for continuous maintenance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;IVFFlat is a powerful ANN indexing method available in pgvector, offering a balance of performance, memory efficiency, and simplicity.&lt;br&gt;
With proper configuration and maintenance, IVFFlat can deliver high-performance vector search right inside PostgreSQL, no external database required.&lt;/p&gt;

</description>
      <category>database</category>
      <category>vectordatabase</category>
      <category>ai</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Hardening SSH on Ubuntu: Custom Admin User and Locking Down Access</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Wed, 13 Aug 2025 06:30:46 +0000</pubDate>
      <link>https://forem.com/azayshrestha/hardening-ssh-on-ubuntu-custom-admin-user-and-locking-down-access-251e</link>
      <guid>https://forem.com/azayshrestha/hardening-ssh-on-ubuntu-custom-admin-user-and-locking-down-access-251e</guid>
      <description>&lt;p&gt;When you first launch an Ubuntu server, cloud providers often give you a default Ubuntu user with SSH open on port 22. It’s convenient, but also predictable, and predictable accounts are prime targets for automated attacks.&lt;/p&gt;

&lt;p&gt;In this Blog, we'll explore:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new admin user.&lt;/li&gt;
&lt;li&gt;Switch SSH to a non-default port.&lt;/li&gt;
&lt;li&gt;Enforce key-based login only.&lt;/li&gt;
&lt;li&gt;Restrict access to specific users.&lt;/li&gt;
&lt;li&gt;Delete default user&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Create a New Admin User
&lt;/h3&gt;

&lt;p&gt;We’ll replace the generic ubuntu account with our own, here called app.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create the user
sudo adduser app

# Add to the sudo (admin) group
sudo usermod -aG sudo app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy your SSH public key into this account so you can log in without a password:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo mkdir -p /home/app/.ssh
sudo cp /home/ubuntu/.ssh/authorized_keys /home/app/.ssh/
sudo chown -R app:app /home/app/.ssh
sudo chmod 700 /home/app/.ssh
sudo chmod 600 /home/app/.ssh/authorized_keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Change the SSH Port
&lt;/h3&gt;

&lt;p&gt;Most brute-force bots scan port 22. Moving SSH to a higher port won’t stop determined attackers, but it will reduce random noise in your logs.&lt;br&gt;
Edit the SSH config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo nano /etc/ssh/sshd_config

# find port and set
Port 2222
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Harden SSH Settings
&lt;/h3&gt;

&lt;p&gt;While still editing /etc/ssh/sshd_config, add or modify these lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PermitRootLogin no
MaxAuthTries 3
MaxSessions 2
TCPKeepAlive no
PasswordAuthentication no
ClientAliveInterval 3000
ClientAliveCountMax 0
AllowUsers app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What these do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PermitRootLogin no&lt;/strong&gt; - root login is forbidden.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MaxAuthTries 3&lt;/strong&gt; - after 3 failed attempts, the connection drops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MaxSessions 2&lt;/strong&gt; - limits simultaneous open SSH sessions per connection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TCPKeepAlive no&lt;/strong&gt; - avoids lingering TCP connections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PasswordAuthentication no&lt;/strong&gt; - passwords disabled; only SSH keys work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClientAliveInterval / ClientAliveCountMax&lt;/strong&gt; - idle sessions get disconnected after ~50 minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AllowUsers app&lt;/strong&gt; - only the app account can log in.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Install and Update the Firewall
&lt;/h3&gt;

&lt;p&gt;First, install UFW if it’s not already present:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update
sudo apt install -y ufw

# Set a default-deny policy and allow outgoing connections:
sudo ufw default deny incoming
sudo ufw default allow outgoing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update Firewall Rules&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Allow new ssh port &amp;amp; remove old
sudo ufw allow 2222/tcp
sudo ufw delete allow 22/tcp

# Allow HTTP and HTTPS traffic
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable the firewall:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo ufw enable
sudo ufw status verbose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart and Test&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo sshd -t &amp;amp;&amp;amp; sudo systemctl restart ssh

# From another terminal:
ssh -p 2222 app@your-server-ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Retire the Default ubuntu User
&lt;/h3&gt;

&lt;p&gt;Once the new account is confirmed working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo deluser --remove-home ubuntu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(Alternatively, just lock it: sudo usermod --lock ubuntu)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now Your Server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs SSH on port 2222 with key-only login.&lt;/li&gt;
&lt;li&gt;Only accepts logins from app.&lt;/li&gt;
&lt;li&gt;Blocks root login.&lt;/li&gt;
&lt;li&gt;Limits brute-force attempts.&lt;/li&gt;
&lt;li&gt;Has a firewall allowing only SSH (2222), HTTP (80), and HTTPS (443).&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ubuntu</category>
      <category>devops</category>
      <category>aws</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Django Caching Strategies: QuerySet vs ID List</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Mon, 04 Aug 2025 09:19:38 +0000</pubDate>
      <link>https://forem.com/azayshrestha/django-caching-strategies-queryset-vs-id-list-5a10</link>
      <guid>https://forem.com/azayshrestha/django-caching-strategies-queryset-vs-id-list-5a10</guid>
      <description>&lt;h3&gt;
  
  
  Balancing Performance and Flexibility
&lt;/h3&gt;

&lt;p&gt;As Django developers, we're constantly looking for ways to optimize our applications and reduce database load. Caching is one of the most powerful tools, but implementing it effectively requires careful consideration. Today, we'll examine two common caching patterns for database queries and determine which approach delivers better results.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Challenge: Caching Published Books
&lt;/h3&gt;

&lt;p&gt;Imagine we have a &lt;code&gt;Book&lt;/code&gt; model with an &lt;code&gt;is_published&lt;/code&gt; field, and we frequently need to retrieve all published books. To avoid hitting the database repeatedly, we want to cache this queryset. Let's explore two implementation approaches:&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: Caching the Queryset Directly
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qs = cache.get('published_books_qs')
if not qs:
    qs = Book.objects.filter(is_published=True)
    cache.set('published_books_qs', qs)
return qs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach seems straightforward - we attempt to retrieve the queryset from cache, and if it's not there, we query the database and cache the result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: Caching IDs and Refetching
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ids = cache.get('published_books_ids')
if not ids:
    ids = list(Book.objects.filter(is_published=True).values_list('id', flat=True))
    cache.set('published_books_ids', ids)
return Book.objects.filter(id__in=ids)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we cache only the IDs of published books and then perform a fresh query to retrieve the full objects.&lt;/p&gt;

&lt;p&gt;After careful analysis, &lt;strong&gt;Approach 2 (caching IDs) is clearly superior&lt;/strong&gt; for most use cases. Let's break down why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Data Freshness&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QuerySet Caching:&lt;/strong&gt; When you cache a QuerySet, you're storing the actual objects as they existed at the time of caching. If book details change after caching (like price updates or title corrections), subsequent cache hits will return stale data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ID Caching:&lt;/strong&gt; By only caching IDs and performing a fresh query, you always retrieve the most current data from the database. Changes to book details are immediately reflected in your application.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Memory Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QuerySet Caching:&lt;/strong&gt; Storing entire QuerySets consumes significantly more memory. Each book object contains all its fields, which can be substantial if you have many books or large fields.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ID Caching:&lt;/strong&gt; A list of IDs is much more memory-efficient. For example, storing 1,000 integer IDs requires far less space than 1,000 complete book objects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Flexibility&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QuerySet Caching:&lt;/strong&gt; The cached QuerySet is fixed. You can't easily add additional filters or ordering without breaking the cache or invalidating it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ID Caching:&lt;/strong&gt; With cached IDs, you can still apply additional filters, ordering, or select_related/prefetch_related optimizations to the final QuerySet:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Additional filtering is still possible
Book.objects.filter(id__in=ids).order_by('-publication_date')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Cache Reliability&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QuerySet Caching:&lt;/strong&gt; QuerySets contain database connection state and metadata that may become stale. When retrieved from cache, they might execute with outdated context, leading to unexpected behavior or errors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ID Caching:&lt;/strong&gt; Simple data structures like lists of IDs are more reliable to cache. They don't contain database connections or complex ORM state that might expire or become invalid.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Performance Considerations&lt;/strong&gt;&lt;br&gt;
While ID caching requires an additional database query to fetch the full objects, this is typically offset by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced cache memory usage&lt;/li&gt;
&lt;li&gt;Fewer cache invalidations needed&lt;/li&gt;
&lt;li&gt;The ability to optimize the final query with select_related or prefetch_related&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When Might QuerySet Caching Work?
&lt;/h3&gt;

&lt;p&gt;There are limited scenarios where caching the entire QuerySet might be acceptable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Highly static data:&lt;/strong&gt; When the data rarely changes and you can afford stale reads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small datasets:&lt;/strong&gt; When you're dealing with a small number of simple objects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-only operations:&lt;/strong&gt; When you're certain you won't need to modify the objects
Even in these cases, ID caching is often still preferable due to its flexibility and reliability.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices for ID Caching
&lt;/h3&gt;

&lt;p&gt;To implement ID caching effectively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Set appropriate cache timeouts:&lt;/strong&gt; Balance freshness with performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invalidate cache when needed:&lt;/strong&gt; Clear the cached IDs when books are published/unpublished&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize the final query:&lt;/strong&gt; Use select_related or prefetch_related to minimize database hits: &lt;code&gt;Book.objects.filter(id__in=ids).select_related('author').prefetch_related('tags')&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consider cache versioning:&lt;/strong&gt; Add a version key to your cache to easily invalidate all cached items when needed: &lt;code&gt;cache.get('published_books_ids_v2')&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While caching QuerySets directly may seem convenient, caching IDs and performing fresh queries offers significant advantages in terms of data freshness, memory efficiency, flexibility, and reliability. This pattern is particularly valuable in applications where data changes frequently or consistency is important.&lt;br&gt;
The next time you implement caching in Django, consider adopting the ID caching approach. Your application will be more robust, your cache more efficient, and your users will see more current data.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>django</category>
      <category>redis</category>
    </item>
    <item>
      <title>PostgreSQL Database Tuning Guide</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Mon, 28 Jul 2025 05:53:09 +0000</pubDate>
      <link>https://forem.com/azayshrestha/postgresql-database-tuning-guide-30kg</link>
      <guid>https://forem.com/azayshrestha/postgresql-database-tuning-guide-30kg</guid>
      <description>&lt;p&gt;PostgreSQL is a powerful, open-source relational database management system renowned for its stability, versatility, and efficiency. Optimizing PostgreSQL can dramatically improve your database performance.&lt;br&gt;
Let's Optimize PostgreSQL for a Server with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4 Core CPU&lt;/li&gt;
&lt;li&gt;4 GB RAM&lt;/li&gt;
&lt;li&gt;100 GB SSD&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Why Tune PostgreSQL?
&lt;/h3&gt;

&lt;p&gt;The default PostgreSQL configuration is conservative and intended to run safely on almost any hardware. To get maximum performance from your database, you must tailor PostgreSQL's settings to your specific hardware and workload. Correctly tuning your database can significantly improve read/write operations, reduce latency, and improve query performance.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Edit PostgreSQL Configuration File
&lt;/h3&gt;

&lt;p&gt;Open your PostgreSQL config file. Usually found at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Adjust according to your installation path
sudo vi /etc/postgresql/16/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Update These Settings
&lt;/h3&gt;

&lt;p&gt;Paste these recommended settings directly into your postgresql.conf file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Recommended PostgreSQL Settings (for 4GB RAM, 4-core, SSD)

# Connections
max_connections = 200

# Memory
shared_buffers = 1GB
effective_cache_size = 3GB
maintenance_work_mem = 256MB
work_mem = 5140kB

# WAL &amp;amp; Checkpoint settings
checkpoint_completion_target = 0.9
wal_buffers = 16MB
min_wal_size = 1GB
max_wal_size = 4GB

# Query Planning
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200

# Parallelism &amp;amp; Workers
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

# Huge Pages
huge_pages = off
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Restart PostgreSQL
&lt;/h3&gt;

&lt;p&gt;Save your changes and restart PostgreSQL with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl restart postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding Key Parameters
&lt;/h3&gt;

&lt;p&gt;Here's a breakdown of essential PostgreSQL parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;max_connections = 200&lt;/strong&gt;
Number of simultaneous database connections allowed. Limits the number of concurrent database connections. Higher values need more RAM. 200 is suitable for medium workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;shared_buffers = 1GB&lt;/strong&gt;
Memory PostgreSQL uses to cache data in RAM. Typically, 25-40% of total RAM is a good rule.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;effective_cache_size = 3GB&lt;/strong&gt;
PostgreSQL's query planner uses this to estimate available OS cache. Estimate of memory available for disk caching (PostgreSQL + OS cache combined).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;maintenance_work_mem = 256MB&lt;/strong&gt;
Memory for maintenance tasks (vacuuming, indexing). More memory allows these operations to run faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;work_mem = 5140kB (~5MB)&lt;/strong&gt;
Memory allocated for each sorting and hashing in query operations. Small enough for many parallel queries. Set higher if you expect large joins or sorts; avoid excessive temporary disk usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;checkpoint_completion_target = 0.9&lt;/strong&gt;
Controls how evenly the checkpoint writes are spread out. 0.9 reduces I/O spikes by spreading writes over more time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;wal_buffers = 16MB&lt;/strong&gt;
Temporary storage for WAL (Write-Ahead Log) data before it's written to disk. A higher value can improve write performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;min_wal_size = 1GB and max_wal_size = 4GB&lt;/strong&gt;
Controls frequency of checkpoints (log flushing). Controls WAL file disk space usage. Helps balance between disk usage and how often checkpoints occur. Balanced values help SSD lifespan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;default_statistics_target = 100&lt;/strong&gt;
Affects the quality of table statistics. Higher values mean more accurate query plans, but slower ANALYZE times.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;random_page_cost = 1.1&lt;/strong&gt;
Tells the planner the cost of reading a random disk page. SSDs handle random access quickly. Lower value tells PostgreSQL to optimize accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;effective_io_concurrency = 200&lt;/strong&gt;
Indicates how many concurrent I/O operations the system can handle. Higher is better for SSDs or fast storage. SSDs manage multiple I/O operations simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max_worker_processes = 4&lt;/strong&gt;
The total number of background worker processes PostgreSQL can run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max_parallel_workers_per_gather = 2&lt;/strong&gt;
Max parallel workers for a single parallel query (gather node). Controls parallel query execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max_parallel_workers = 4&lt;/strong&gt;
The total parallel workers that can be running across all queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max_parallel_maintenance_workers = 2&lt;/strong&gt;
Max workers for parallel maintenance tasks like CREATE INDEX.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;huge_pages = off&lt;/strong&gt;
Whether to use huge pages (larger memory pages for better performance). Off by default, useful in high-performance setups. Keep off on small RAM systems (4GB or less).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the right tuning, PostgreSQL can deliver significantly better performance tailored to your server’s resources and workload. The configuration outlined above is a solid starting point that optimizes memory usage, connection handling, query planning, and parallelism.&lt;/p&gt;

&lt;p&gt;However, database performance tuning is not a one-time task. It’s essential to continuously monitor key performance metrics such as CPU usage, disk I/O, query times, and cache hit ratios, to ensure your settings remain effective as your data grows or your application load changes.&lt;/p&gt;

&lt;p&gt;Be prepared to adjust configurations as needed, based on real-world usage and evolving demands. Regularly revisiting your PostgreSQL settings will help maintain peak performance and application responsiveness over time.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>Beyond Cosine Similarity: Multi-Faceted Scoring for Smarter Recommendations</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Wed, 23 Jul 2025 03:56:19 +0000</pubDate>
      <link>https://forem.com/azayshrestha/beyond-cosine-similarity-multi-faceted-scoring-for-smarter-recommendations-1kkk</link>
      <guid>https://forem.com/azayshrestha/beyond-cosine-similarity-multi-faceted-scoring-for-smarter-recommendations-1kkk</guid>
      <description>&lt;p&gt;Cosine similarity is a widely adopted metric for measuring the similarity between two vectors in high-dimensional space. It's fast, easy to implement, and useful in many applications such as recommendation systems, document clustering, and semantic search.&lt;/p&gt;

&lt;p&gt;But in real-world systems, especially in decision-making domains like recruitment or personalized recommendations, relying solely on cosine similarity can produce misleading results. To build smarter, more effective matching systems, we need to go beyond simple vector similarity and design multi-faceted scoring models that combine semantic understanding with concrete, contextual hiring signals.&lt;/p&gt;

&lt;p&gt;This article walks through how to enhance cosine similarity with a multi-faceted scoring framework, using a job-candidate matching system as a running example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Cosine Similarity Alone Falls Short
&lt;/h3&gt;

&lt;p&gt;Imagine you’re building a job-matching platform. You compute the similarity between a job description and a candidate profile based on skill embeddings. But here’s the problem:&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job Posting&lt;/strong&gt;: Senior Python Developer&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skills: Python, Django, PostgreSQL, Docker, AWS&lt;/li&gt;
&lt;li&gt;Experience Level: Senior (5+ years)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Candidate A:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skills: Python, Django, PostgreSQL, Docker, AWS, React&lt;/li&gt;
&lt;li&gt;Experience: 2 years&lt;/li&gt;
&lt;li&gt;Cosine Similarity: 0.95&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Candidate B:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skills: Python, Django, PostgreSQL, Redis, Kubernetes&lt;/li&gt;
&lt;li&gt;Experience: 6 years&lt;/li&gt;
&lt;li&gt;Cosine Similarity: 0.87&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite the lower similarity, Candidate B is a better match. This reveals the limitations of cosine similarity, it captures surface-level similarity in skill vectors but overlooks contextual relevance, such as experience or seniority. That’s where multi-faceted scoring comes in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Faceted Scoring: A Composite Architecture
&lt;/h3&gt;

&lt;p&gt;To account for domain-specific constraints and improve relevance, we combine multiple scoring dimensions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;final_score = (
    0.7 * cosine_similarity +
    0.2 * skill_overlap_score +
    0.1 * level_match_score
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Cosine Similarity (70%)
&lt;/h3&gt;

&lt;p&gt;Captures semantic relationships between Candidate Profile and Job requirements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def cosine_similarity(job_vector, candidate_vector):
    # Measures semantic similarity between embeddings
    return np.dot(job_vector, candidate_vector) / (
        np.linalg.norm(job_vector) * np.linalg.norm(candidate_vector) + 1e-10
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Skill Overlap Score (20%)
&lt;/h3&gt;

&lt;p&gt;Captures exact matches in skill sets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def compute_skill_overlap(job_skills, candidate_skills):
    # Percentage of required skills candidate possesses
    job_set = set(map(str.lower, job_skills))
    candidate_set = set(map(str.lower, candidate_skills))
    return len(job_set &amp;amp; candidate_set) / len(job_set) if job_set else 0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Level Match Score (10%)
&lt;/h3&gt;

&lt;p&gt;Ensures experience level compatibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def compute_level_score(job_level, candidate_level):
    # Asymmetric scoring favors slight over-qualification
    levels = ['Entry Level', 'Mid Level', 'Senior Level', 'Top Level']

    job_idx = levels.index(job_level)
    candidate_idx = levels.index(candidate_level)

    diff = candidate_idx - job_idx

    # Asymmetric penalty: slight over-qualification is better than under-qualification
    # Over-qualified or exact match
    if diff &amp;gt;= 0:
        # Gentle penalty
        return max(0, 1 - (diff * 0.2))  
    else:  # Under-qualified
        # Steeper penalty
        return max(0, 1 - (abs(diff) * 0.3))

# Exact match - 1.0
# Slightly over-qualified - gently penalized at 0.2 per step (e.g., 0.8, 0.6, 0.4)
# Slightly under-qualified - more harshly penalized at 0.3 per step (e.g., 0.7, 0.4, 0.1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full Implementation: Job Matcher Class
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class JobMatcher:
    def __init__(self, weights=None):
        self.weights = weights or {'cosine': 0.7, 'skills': 0.2, 'experience': 0.1}

    def calculate_final_score(self, job_vector, candidate_vector, job_skills, candidate_skills, job_level, candidate_level):

        cosine_sim = self.cosine_similarity(job_vector, candidate_vector)
        skill_score = self.compute_skill_overlap(job_skills, candidate_skills)
        level_score = self.compute_level_score(job_level, candidate_level)

        final_score = (
            self.weights['cosine'] * cosine_sim +
            self.weights['skills'] * skill_score +
            self.weights['level'] * level_score
        )

        return {
            'final_score': final_score,
            'cosine_similarity': cosine_sim,
            'skill_score': skill_score,
            'level_score': level_score
        }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Advanced Strategies for Customization
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. Dynamic Weighting by Job Role
&lt;/h3&gt;

&lt;p&gt;Not all roles should be scored equally. Entry-level jobs may rely more on skills, while leadership roles weigh experience more heavily:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_scoring_strategy(job_type):
    strategies = {
        'entry_level': {'cosine': 0.8, 'skills': 0.15, 'level': 0.05},
        'mid_level': {'cosine': 0.7, 'skills': 0.2, 'level': 0.1},
        'senior_level': {'cosine': 0.65, 'skills': 0.2, 'level': 0.15},
        'top_level': {'cosine': 0.7, 'skills': 0.2, 'level': 0.2}
    }
    return strategies.get(job_type, {'cosine': 0.7, 'skills': 0.2, 'level': 0.1})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Contextual Scoring: Beyond Skills
&lt;/h3&gt;

&lt;p&gt;Add context-aware dimensions like location, salary expectation, or availability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def enhanced_scoring(job_data, candidate_data):
    base = calculate_basic_scores(job_data, candidate_data)

    location_score = calculate_location_match(job_data['location'], candidate_data['location'])
    salary_score = calculate_salary_match(job_data['salary_range'], candidate_data['expected_salary'])
    availability_score = calculate_availability(job_data['start_date'], candidate_data['availability'])

    return {
        **base,
        'location_score': location_score,
        'salary_score': salary_score,
        'availability_score': availability_score,
        'final_score': aggregate_weighted_sum(base, location_score, salary_score, availability_score)
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Benefits of a Multi-Faceted Approach
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Better Real-World Relevance:&lt;/strong&gt; Incorporates context like experience and domain fit.&lt;br&gt;
&lt;strong&gt;Improved Interpretability:&lt;/strong&gt; Each sub-score can be independently analyzed.&lt;br&gt;
&lt;strong&gt;Customizability:&lt;/strong&gt; Weights and dimensions are flexible and data-driven.&lt;br&gt;
&lt;strong&gt;Reduction in False Positives:&lt;/strong&gt; Multiple dimensions reduce reliance on a single vector match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cosine similarity&lt;/strong&gt; is a solid starting point, but it’s just that: a start. Real-world decision systems demand a deeper, context-rich evaluation framework. A multi-faceted scoring approach enables your system to reflect real business logic and user intent, unlocking more relevant, equitable, and effective recommendations.&lt;/p&gt;

&lt;p&gt;Whether you're building a job matching platform, recommendation engine, or personalized search algorithm, integrating multiple signals, semantic, categorical, and contextual, will give your system the precision and flexibility needed to succeed in production environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with cosine. Scale with context. Optimize with data.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>vectordatabase</category>
      <category>nlp</category>
    </item>
    <item>
      <title>Optimize Static Delivery: Host Static Assets on AWS EC2 with Nginx &amp; Cloudflare</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Wed, 09 Jul 2025 09:51:32 +0000</pubDate>
      <link>https://forem.com/azayshrestha/optimize-static-delivery-host-static-assets-on-aws-ec2-with-nginx-cloudflare-1h76</link>
      <guid>https://forem.com/azayshrestha/optimize-static-delivery-host-static-assets-on-aws-ec2-with-nginx-cloudflare-1h76</guid>
      <description>&lt;p&gt;In modern web architecture, speed and scalability are non-negotiable. A CDN (Content Delivery Network) plays a critical role in improving site performance by delivering static assets closer to the end users. Delivering static assets (CSS, JavaScript, images) from a standalone CDN server can dramatically improve your site’s performance and reliability. &lt;br&gt;
In this post, we’ll walk through setting up an AWS EC2 instance, hosting static files, serving them using Nginx, and dramatically improving their delivery speed using Cloudflare as a CDN.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why a Separate CDN Server?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation of concerns:&lt;/strong&gt; Your web and app servers handle dynamic traffic, while your CDN server exclusively serves static content.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability:&lt;/strong&gt; You can scale or snapshot your CDN layer independently.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache-control:&lt;/strong&gt; Nginx and Cloudflare provide fine-grained caching without requiring changes to Django.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  1. SSH into Your Server
&lt;/h2&gt;

&lt;p&gt;Use your SSH key and the EC2 public IP to connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -i path/to/your-key.pem ubuntu@YOUR_EC2_PUBLIC_IP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Installing Nginx &amp;amp; Preparing &lt;code&gt;~/static&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Update your package list and install Nginx:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update
sudo apt install -y nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the static files directory in your home folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir -p ~/static
chown -R $USER:www-data ~/static
chmod -R 755 ~/static
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Now /home/ubuntu/static is ready to receive your collected assets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. Nginx Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# In your home directory, create a conf folder
mkdir -p ~/conf
cd ~/conf

# Edit your nginx.conf
vim nginx.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside ~/conf/nginx.conf, add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;server {
    listen 80;
    server_name cdn.example.com;

    # Get real visitor IP from Cloudflare
    real_ip_header CF-Connecting-IP;
    set_real_ip_from 0.0.0.0/0;

    # Serve static files from ~/static
    location / {
        root /home/ubuntu;
        try_files /static$uri =404;

        # Cache for 7 days
        expires 7d;
        add_header Cache-Control "public, max-age=604800";

        # No access logs for static files
        access_log off;
    }

    # Let's Encrypt support
    location /.well-known/acme-challenge/ {
        root /home/ubuntu;
    }

    # Health check
    location /health {
        return 200 "OK";
        access_log off;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then activate it by symlinking into Nginx’s conf.d:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo ln -sf /home/ubuntu/conf/nginx.conf /etc/nginx/conf.d/cdn_nginx.conf
sudo nginx -t
sudo systemctl reload nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Pointing cdn.example.com to Your EC2 + Cloudflare
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;In your DNS provider or Cloudflare, create an A record:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Name: cdn&lt;/li&gt;
&lt;li&gt;Type: A&lt;/li&gt;
&lt;li&gt;Value: YOUR_EC2_PUBLIC_IP&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;In Cloudflare’s dashboard, set Proxy status to Proxied.
Requests to cdn.example.com will now route through Cloudflare’s edge network.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  5. Syncing Your Static Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rsync -av --delete path/to/local/static/ ubuntu@YOUR_EC2_PUBLIC_IP:/home/ubuntu/static/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;-a preserves permissions&lt;br&gt;
--delete removes files no longer present locally&lt;br&gt;
Automate this step so every deployment populates your CDN.&lt;/p&gt;
&lt;h3&gt;
  
  
  6. Enabling HTTPS on the CDN Server
&lt;/h3&gt;

&lt;p&gt;For Cloudflare’s Full (strict) SSL mode, install a Let’s Encrypt certificate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d cdn.example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Certbot will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configure Nginx to listen on port 443&lt;/li&gt;
&lt;li&gt;Set up auto-renewal&lt;/li&gt;
&lt;li&gt;Redirect HTTP to HTTPS&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Django Configuration
&lt;/h3&gt;

&lt;p&gt;In your production settings (settings.py), set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;STATIC_URL = "https://cdn.example.com/"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No other Django changes are required. All &lt;code&gt;{% static %}&lt;/code&gt; tags will now reference your CDN host.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Verifying Cache &amp;amp; Performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open Developer Tools - Network - reload a page with static assets.&lt;/li&gt;
&lt;li&gt;Inspect headers for CSS/JS files
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cf-cache-status: HIT
cache-control: max-age=2592000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;In Cloudflare’s dashboard, review Cache Analytics. Aim for a high Hit Ratio.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Advanced Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cache Purge&lt;/strong&gt;: Use Cloudflare’s API or dashboard to purge specific URLs after critical updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Lock down SSH via Cloudflare Firewall, and allow only trusted IPs.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cloud</category>
      <category>nginx</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Automating PostgreSQL Backups with a Shell Script</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Mon, 23 Jun 2025 10:09:07 +0000</pubDate>
      <link>https://forem.com/azayshrestha/automating-postgresql-backups-with-a-shell-script-4m8h</link>
      <guid>https://forem.com/azayshrestha/automating-postgresql-backups-with-a-shell-script-4m8h</guid>
      <description>&lt;p&gt;Backups serve as a safety net for any application that stores critical data. If you’re running a PostgreSQL database on a Linux server, automating regular backups is essential for disaster recovery and peace of mind.&lt;br&gt;
In this blog, we’ll explore simple yet powerful shell script that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dumps a PostgreSQL database&lt;/li&gt;
&lt;li&gt;Compresses the backup&lt;/li&gt;
&lt;li&gt;Stores it with a timestamp&lt;/li&gt;
&lt;li&gt;Transfers it to a remote server&lt;/li&gt;
&lt;li&gt;Keeps only the 10 most recent backups&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Why Automate PostgreSQL Backups?
&lt;/h3&gt;

&lt;p&gt;Manually backing up a database is risky. You might forget, or worse, do it incorrectly. Automating the process ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt;: Backups happen the same way every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accountability&lt;/strong&gt;: Timestamped files provide a history of backups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Offsite backups reduce data loss risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt;: Old backups are purged automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before using this script, make sure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have a PostgreSQL database running.&lt;/li&gt;
&lt;li&gt;Your user has sudo access.&lt;/li&gt;
&lt;li&gt;You can scp to a remote server using SSH keys (no password prompts).&lt;/li&gt;
&lt;li&gt;The target backup directory exists on the remote machine (/home/ubuntu/backups/).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The Script
&lt;/h3&gt;

&lt;p&gt;Here’s the complete script that automates your PostgreSQL backups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/bin/sh

# Set timestamp using system's local time
timestamp=$(date +%Y-%m-%d_%H-%M-%S)
backup_dir="/home/ubuntu/backups"
backup_file="${backup_dir}/${timestamp}.psql.gz"

# Dump the PostgreSQL database
sudo su postgres -c "pg_dump -O db_name &amp;gt; /tmp/back.psql"

# Compress the backup
gzip -f /tmp/back.psql

# Ensure backup directory exists
mkdir -p "$backup_dir"

# Move the compressed backup to the backup directory
mv /tmp/back.psql.gz "$backup_file"

# Copy the backup file to the remote server
scp "$backup_file" ubuntu@IP:/home/ubuntu/backups/

# Retain only the 10 most recent backups
if [ -d "$backup_dir" ]; then
    echo "Backup folder exists."

    cd "$backup_dir" || { echo "Failed to cd into $backup_dir"; exit 1; }

    ls -t *.psql.gz | tail -n +11 | xargs -r rm -f
else
    echo "Backup folder does not exist."
fi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to Use This Script
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;db_name&lt;/code&gt; with your actual database name.&lt;/li&gt;
&lt;li&gt;Replace &lt;code&gt;IP&lt;/code&gt; in the scp line with your remote server’s IP address or hostname.&lt;/li&gt;
&lt;li&gt;Make the script executable:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chmod +x backup.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Run it manually or automate it with cron:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;crontab -e

# Example for daily backups at 2 AM:
0 2 * * * /path/to/backup.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Script Breakdown
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Timestamping the Backup
&lt;/h4&gt;

&lt;p&gt;Generates a clean, colon-free timestamp using the system's current local time. This helps uniquely name each backup file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;timestamp=$(date +%Y-%m-%d_%H-%M-%S)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Database Dump and Compression
&lt;/h4&gt;

&lt;p&gt;The script uses &lt;code&gt;pg_dump&lt;/code&gt; to export the database and compresses the result using &lt;code&gt;gzip&lt;/code&gt;. The &lt;code&gt;-O&lt;/code&gt; flag omits ownership commands in the SQL dump.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo su postgres -c "pg_dump -O db_name &amp;gt; /tmp/back.psql"
gzip -f /tmp/back.psql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Local and Remote Storage
&lt;/h4&gt;

&lt;p&gt;Backups are first stored locally with a timestamped filename. Then, they're securely copied to a remote server using &lt;code&gt;scp&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mv /tmp/back.psql.gz "$backup_file"
scp "$backup_file" ubuntu@IP:/home/ubuntu/backups/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Cleaning Up Old Backups
&lt;/h4&gt;

&lt;p&gt;This line ensures only the 10 most recent backups are kept, preventing unnecessary disk usage over time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ls -t *.psql.gz | tail -n +11 | xargs -r rm -f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Enhancing the Script: Cloud Storage Integration (Optional)
&lt;/h3&gt;

&lt;p&gt;While local and remote backups are great, integrating cloud storage can elevate your backup strategy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Amazon S3 using the AWS CLI
aws s3 cp "$backup_file" s3://your-s3-bucket-name/backups/

# Google Cloud Storage
gsutil cp "$backup_file" gs://your-gcs-bucket/backups/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Backing up your data is not optional; it’s a necessity. With automation in place, you can sleep better knowing your data is safe.&lt;/p&gt;

</description>
      <category>bash</category>
      <category>postgres</category>
      <category>devops</category>
      <category>database</category>
    </item>
    <item>
      <title>NLP: Tokenization to Vectorization</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Mon, 16 Jun 2025 06:00:26 +0000</pubDate>
      <link>https://forem.com/azayshrestha/nlp-tokenization-to-vectorization-54il</link>
      <guid>https://forem.com/azayshrestha/nlp-tokenization-to-vectorization-54il</guid>
      <description>&lt;p&gt;Natural Language Processing (NLP) is a domain that bridges human languages and computer intelligence. In this blog, we’ll explore the crucial steps, from basics like tokenization, stemming, and lemmatization to vectorization, and understanding how text data is transformed into machine-readable formats. Let's break down each of the foundational techniques.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tokenization
&lt;/h3&gt;

&lt;p&gt;Tokenization is the process of breaking text into smaller units called tokens. These tokens can be words, sentences, or even subwords.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Word Tokenization:&lt;/strong&gt; Splits sentences into words.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Example
Input: "Natural Language Processing"
Tokens: ["Natural", "Language", "Processing"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sentence Tokenization:&lt;/strong&gt; Divides text into sentences, essential for tasks like summarization.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Example
Input: "NLP is fascinating. It has endless applications!"
Tokens: ["NLP is fascinating.", "It has endless applications!"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Stemming
&lt;/h3&gt;

&lt;p&gt;Stemming reduces words to their root forms by removing suffixes or prefixes. It’s fast but can produce roots that aren’t actual words.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Example:
Words: "running", "runs", "runner"
Stems: "run", "run", "runner"
# Use Case: Information retrieval, indexing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Lemmatization
&lt;/h3&gt;

&lt;p&gt;Lemmatization reduces words to their actual base form (lemma) using vocabulary and morphological analysis. It’s more accurate than stemming.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Example:
Words: "running", "runs", "ran"
Lemma: "run", "run", "run"
# Use Case: Sentiment analysis, chatbots.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Stop Word Removal
&lt;/h3&gt;

&lt;p&gt;Stop words are common, frequently-used words (like "the", "and", "is") that often carry little semantic meaning and can clutter text analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example:
Original: "AI is changing the world and transforming industries."
After Removal: "AI changing world transforming industries."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Part-of-Speech (POS) Tagging
&lt;/h3&gt;

&lt;p&gt;POS tagging classifies words based on grammatical categories (e.g., noun, verb, adjective). This enhances NLP tasks by adding grammatical context to text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example:
Input: "AI transforms industries."
POS Tags: [('AI', 'NNP'), ('transforms', 'VBZ'), ('industries', 'NNS'), ('.', '.')]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Common POS Tags:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NN:&lt;/strong&gt; Noun, singular or mass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VB:&lt;/strong&gt; Verb, base form&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JJ:&lt;/strong&gt; Adjective&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RB:&lt;/strong&gt; Adverb&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  6. Embeddings (Vectorization)
&lt;/h3&gt;

&lt;p&gt;Embeddings convert words into continuous vectors, capturing semantic meaning and relationships between words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common Models:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Word2Vec:&lt;/strong&gt; Learns embeddings based on context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GloVe:&lt;/strong&gt; Combines local context (Word2Vec approach) and global statistics from large corpora.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastText:&lt;/strong&gt; Enhances embedding by considering subwords, helpful with rare words or multilingual contexts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Embeddings Matter:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enables models to interpret semantic relationships (e.g., synonyms, antonyms, analogies).&lt;/li&gt;
&lt;li&gt;Fundamental for deep learning NLP tasks such as text classification, sentiment analysis, and translation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mastering foundational NLP techniques like &lt;strong&gt;Tokenization, Stemming and Lemmatization, Stop Word Removal, POS Tagging, and Embeddings&lt;/strong&gt; provides a strong foundation for advanced text analysis. With these basics, you're now prepared to dive deeper into NLP's exciting complexities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended Next Approaches:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NER:&lt;/strong&gt; Detect names, places, organizations in text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency Parsing:&lt;/strong&gt; Understand word relationships.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text Classification:&lt;/strong&gt; Categorize text (e.g., spam, sentiment).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topic Modeling:&lt;/strong&gt; Uncover hidden themes in documents.
Transformers (e.g., BERT): Use advanced models for deep language understanding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarization:&lt;/strong&gt; Create concise versions of long texts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Q&amp;amp;A and Chatbots:&lt;/strong&gt; Build systems that answer questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text Generation:&lt;/strong&gt; Generate human-like content automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build an NLP Pipeline:&lt;/strong&gt; Apply all basics using NLTK, spaCy, or Hugging Face.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Upgrading Django with "python -W always manage.py test"</title>
      <dc:creator>AJAYA SHRESTHA</dc:creator>
      <pubDate>Wed, 23 Apr 2025 04:54:13 +0000</pubDate>
      <link>https://forem.com/azayshrestha/upgrading-django-with-python-wa-managepy-test-2l69</link>
      <guid>https://forem.com/azayshrestha/upgrading-django-with-python-wa-managepy-test-2l69</guid>
      <description>&lt;p&gt;Upgrading Django to a newer version is a crucial step in keeping your project secure, performant, and aligned with the latest features and improvements. As with any major upgrade, Django releases often introduce new features, deprecate older ones, or even remove them altogether. This process can potentially break existing code if not done carefully. One of the most effective ways to ensure a smooth upgrade is by using automated testing to catch any compatibility or deprecated feature issues early.&lt;/p&gt;

&lt;p&gt;One key command that can help you in this process is &lt;strong&gt;python -W always manage.py test&lt;/strong&gt;. This command forces Python to always display warnings during test runs, ensuring that you catch any deprecated features or potential compatibility issues in your code. In this blog, we’ll discuss how upgrading Django works, the importance of running tests with the &lt;code&gt;-W always&lt;/code&gt; flag, and best practices to follow when upgrading Django.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Should You Upgrade Django?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Security Patches&lt;/strong&gt;&lt;br&gt;
Each new version of Django typically includes critical security fixes. By staying updated, you ensure that your project remains secure against known vulnerabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Performance Improvements&lt;/strong&gt;&lt;br&gt;
New versions often come with optimizations that improve the performance of your Django project, such as reduced memory usage or faster queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. New Features&lt;/strong&gt;&lt;br&gt;
With every major release, Django introduces new features that make development easier, such as better database handling, new ORM capabilities, or enhanced admin functionalities. Staying updated means you have access to these new features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Community Support&lt;/strong&gt;&lt;br&gt;
Older versions of Django eventually stop receiving support. Upgrading ensures that your project continues to be supported by the Django community, with access to updates, bug fixes, and security patches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Compatibility with New Dependencies&lt;/strong&gt;&lt;br&gt;
Third-party packages, libraries, and tools are often updated to work with newer versions of Django. By staying updated, you ensure that your project remains compatible with the broader Django ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenges of Upgrading Django
&lt;/h3&gt;

&lt;p&gt;Upgrading Django is not always straightforward, especially if your project is built on an older version. &lt;br&gt;
Some of the common challenges include:&lt;br&gt;
&lt;strong&gt;1. Deprecation warnings&lt;/strong&gt;&lt;br&gt;
Features that were once valid may no longer be supported in the new version of Django. These deprecation can cause issues if not addressed promptly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Breaking Changes&lt;/strong&gt;&lt;br&gt;
Sometimes, changes in Django’s architecture or features may lead to incompatibilities, breaking parts of your project if the upgrade is not handled carefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Third-Party Packages&lt;/strong&gt;&lt;br&gt;
Some packages may not immediately support the latest version of Django, leading to issues or even breaking your project’s functionality.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Role of &lt;code&gt;python -W always manage.py test&lt;/code&gt; in Upgrading Django
&lt;/h3&gt;

&lt;p&gt;The command python -W always manage.py test is an incredibly helpful tool when upgrading Django. Here's how it plays a role in ensuring a smooth transition:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Catching Deprecation Warnings&lt;/strong&gt;&lt;br&gt;
When upgrading Django, you’ll likely encounter deprecation warnings, especially if your project is using older features. By using the -WA flag, you ensure that these warnings are not suppressed and are displayed during your tests.&lt;br&gt;
&lt;code&gt;python -W always manage.py test&lt;/code&gt;&lt;br&gt;
This command will run your test suite and display all warnings, including deprecation warnings that indicate features that will be removed in future versions of Django. These warnings are critical when upgrading, as they can help you identify code that needs to be refactored to remain compatible with the new version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Ensuring Compatibility with the New Django Version&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;-W always&lt;/code&gt; flag makes sure that any issues related to compatibility between your project’s code and the new version of Django are highlighted. These could include:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Changes to Django's ORM.&lt;/li&gt;
&lt;li&gt;Changes to middleware, templates, or views.&lt;/li&gt;
&lt;li&gt;Updated patterns for URL routing, forms, or database migrations.
By running the tests with the -WA flag, you can identify these issues early in the upgrade process, minimizing the risk of introducing bugs or compatibility issues into your production environment.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;3. Monitoring Third-Party Dependencies&lt;/strong&gt;&lt;br&gt;
As part of the Django upgrade, you may also need to upgrade or modify your third-party dependencies to maintain compatibility with the new version of Django. By running tests with the -W always flag, you can quickly identify issues caused by outdated third-party packages that may not fully support the new version of Django.&lt;br&gt;
If warnings related to third-party libraries appear during testing, you can:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Check for updates or patches for those libraries.&lt;/li&gt;
&lt;li&gt;Consider replacing unsupported libraries with alternatives.&lt;/li&gt;
&lt;li&gt;Monitor the release notes of your dependencies to stay informed of any changes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;4. Proactive Debugging&lt;/strong&gt;&lt;br&gt;
Using the &lt;code&gt;-W always&lt;/code&gt; flag proactively highlights potential issues during the testing phase, allowing you to debug and address problems early. By catching warnings and errors early, you can make incremental fixes and adjustments, ensuring that your project is stable and compatible with the new Django version before you deploy.&lt;/p&gt;

&lt;p&gt;Upgrading Django is a necessary but sometimes challenging process. By using the &lt;code&gt;python -W always manage.py test&lt;/code&gt; command, you can identify warnings and potential issues early in the upgrade process, making it easier to address problems before they affect your production environment.&lt;/p&gt;

&lt;p&gt;In addition to using this command, following best practices such as backing up your project, updating dependencies, and testing in a staging environment can help ensure a smooth upgrade. By adopting these strategies, you can take full advantage of the latest features and improvements in Django while keeping your project secure, performant, and compatible with the Django ecosystem.&lt;/p&gt;

</description>
      <category>python</category>
      <category>django</category>
      <category>development</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
