<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Krishna Kumar</title>
    <description>The latest articles on Forem by Krishna Kumar (@krishna_kumar_8981e04d8a2).</description>
    <link>https://forem.com/krishna_kumar_8981e04d8a2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3775928%2F44805bd8-83e3-4e52-8a76-e4e983a7d320.png</url>
      <title>Forem: Krishna Kumar</title>
      <link>https://forem.com/krishna_kumar_8981e04d8a2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/krishna_kumar_8981e04d8a2"/>
    <language>en</language>
    <item>
      <title>Pruning in Deep Learning: Structured vs Unstructured</title>
      <dc:creator>Krishna Kumar</dc:creator>
      <pubDate>Mon, 16 Feb 2026 14:34:34 +0000</pubDate>
      <link>https://forem.com/krishna_kumar_8981e04d8a2/pruning-in-deep-learning-structured-vs-unstructured-4dec</link>
      <guid>https://forem.com/krishna_kumar_8981e04d8a2/pruning-in-deep-learning-structured-vs-unstructured-4dec</guid>
      <description>&lt;p&gt;Deep learning models are becoming larger and more powerful every year. From mobile vision systems to large language models, the number of parameters has exploded. But do we really need all those parameters?&lt;/p&gt;

&lt;p&gt;This is where model pruning comes in.&lt;/p&gt;

&lt;p&gt;Pruning is a model compression technique that removes unnecessary parameters from neural networks while maintaining performance. It helps in reducing model size, improving inference speed, and lowering computational cost.&lt;/p&gt;

&lt;p&gt;In this blog, we’ll explore:&lt;/p&gt;

&lt;p&gt;What is pruning?&lt;/p&gt;

&lt;p&gt;Why pruning is needed?&lt;/p&gt;

&lt;p&gt;Structured vs Unstructured pruning&lt;/p&gt;

&lt;p&gt;Practical trade-offs&lt;/p&gt;

&lt;p&gt;🚀 Why Do We Need Pruning?&lt;/p&gt;

&lt;p&gt;Modern neural networks:&lt;/p&gt;

&lt;p&gt;Require high memory&lt;/p&gt;

&lt;p&gt;Consume more power&lt;/p&gt;

&lt;p&gt;Have slower inference on edge devices&lt;/p&gt;

&lt;p&gt;Are expensive to deploy&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Mobile apps need lightweight models&lt;/p&gt;

&lt;p&gt;Embedded systems have limited RAM&lt;/p&gt;

&lt;p&gt;Edge AI requires fast inference&lt;/p&gt;

&lt;p&gt;Pruning solves these issues by removing redundant weights.&lt;/p&gt;

&lt;p&gt;🌳 What is Model Pruning?&lt;/p&gt;

&lt;p&gt;Model pruning is the process of removing parameters (weights, neurons, filters, or even layers) from a trained neural network to make it smaller and faster.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;p&gt;Many weights in a trained neural network contribute very little to the final prediction.&lt;/p&gt;

&lt;p&gt;So we remove them.&lt;/p&gt;

&lt;p&gt;Pruning generally follows this workflow:&lt;/p&gt;

&lt;p&gt;Train the full model&lt;/p&gt;

&lt;p&gt;Remove less important weights&lt;/p&gt;

&lt;p&gt;Fine-tune the pruned model&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔹 1. Unstructured Pruning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;📌 What is Unstructured Pruning?&lt;/p&gt;

&lt;p&gt;Unstructured pruning removes individual weights from the network based on some importance criteria (usually small magnitude weights).&lt;/p&gt;

&lt;p&gt;It creates sparse matrices — meaning many weights become zero.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flquuto6jxddqmcz1bn09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flquuto6jxddqmcz1bn09.png" alt=" " width="699" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How It Works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Calculate magnitude of weights&lt;br&gt;
Remove weights below a threshold&lt;br&gt;
Set them to zero&lt;br&gt;
Fine-tune the model&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Can achieve very high compression rates&lt;br&gt;
Minimal accuracy drop&lt;br&gt;
More flexible&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sparse matrices are not always hardware-friendly&lt;br&gt;
Requires special libraries for speed improvement&lt;br&gt;
Irregular memory access&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a layer has 1000 weights, and 70% are pruned:&lt;br&gt;
Only 300 active weights remain&lt;br&gt;
But structure of the layer stays the same&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔹 2. Structured Pruning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What is Structured Pruning?&lt;/p&gt;

&lt;p&gt;Structured pruning removes entire neurons, channels, filters, or layers instead of individual weights.&lt;/p&gt;

&lt;p&gt;Instead of making matrices sparse, it changes the architecture itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1eyez6a9cx2vqhl12sn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1eyez6a9cx2vqhl12sn.png" alt=" " width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9yokg8zcn1cg2i5rnki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9yokg8zcn1cg2i5rnki.png" alt=" " width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;** How It Works**&lt;/p&gt;

&lt;p&gt;Evaluate importance of filters or neurons&lt;br&gt;
Remove the least important ones&lt;br&gt;
Rebuild the network&lt;br&gt;
Fine-tune&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hardware-friendly&lt;br&gt;
Faster inference&lt;br&gt;
Easy deployment&lt;br&gt;
No need for sparse computation libraries&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Slightly higher accuracy drop (if aggressive)&lt;br&gt;
Less granular control compared to unstructured pruning&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a CNN layer has 64 filters and 20 are removed:&lt;br&gt;
The new layer has 44 filters&lt;br&gt;
Model becomes physically smaller&lt;/p&gt;

&lt;p&gt;** When to Use Which?**&lt;br&gt;
Use Unstructured Pruning When:&lt;br&gt;
Maximum compression is needed&lt;br&gt;
You have sparse acceleration support&lt;br&gt;
Research experimentation&lt;/p&gt;

&lt;p&gt;Use Structured Pruning When:&lt;br&gt;
Deploying to real devices&lt;br&gt;
Mobile / edge AI&lt;br&gt;
Need real inference speed-up&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Applications&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MobileNet optimization&lt;br&gt;
Edge AI devices&lt;br&gt;
Autonomous vehicles&lt;br&gt;
NLP model compression&lt;br&gt;
LLM efficiency improvements&lt;br&gt;
Large-scale models often combine:&lt;br&gt;
Pruning&lt;br&gt;
Quantization&lt;br&gt;
Knowledge distillation&lt;/p&gt;

&lt;p&gt;Together, they create efficient AI systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pruning is not just about reducing size — it's about making AI practical.&lt;/p&gt;

&lt;p&gt;As models grow larger, efficiency techniques like pruning become essential. Structured pruning is practical for deployment, while unstructured pruning offers maximum compression.&lt;/p&gt;

&lt;p&gt;The future of AI is not just bigger models — but smarter, leaner models.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
