<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: NARESH-CN2</title>
    <description>The latest articles on Forem by NARESH-CN2 (@nareshcn2).</description>
    <link>https://forem.com/nareshcn2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3865284%2F71db6cd5-1013-429a-ab2d-3304391bd4f1.jpg</url>
      <title>Forem: NARESH-CN2</title>
      <link>https://forem.com/nareshcn2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/nareshcn2"/>
    <language>en</language>
    <item>
      <title>How to Bypass the Pandas "Object Tax": Building an 8x Faster CSV Engine in C</title>
      <dc:creator>NARESH-CN2</dc:creator>
      <pubDate>Thu, 09 Apr 2026 07:16:35 +0000</pubDate>
      <link>https://forem.com/nareshcn2/how-to-bypass-the-pandas-object-tax-building-an-8x-faster-csv-engine-in-c-1k15</link>
      <guid>https://forem.com/nareshcn2/how-to-bypass-the-pandas-object-tax-building-an-8x-faster-csv-engine-in-c-1k15</guid>
      <description>&lt;p&gt;The Problem: The "Object Tax"If you’ve ever tried to load a 1GB CSV into a Pandas DataFrame, you’ve seen your RAM usage spike to 3GB or 4GB before the process inevitably crashes with an OutOfMemoryError.This isn't just a "Python is slow" problem. It's an Object Tax problem. Every single value in that CSV is being wrapped in a heavy Python object. When you have 10 million rows, those objects become a massive weight that sinks your performance.The Experiment: Dropping to the MetalI wanted to see exactly how much performance we are leaving on the table. I built a custom C-extension for Python called Axiom-CSV.The ArchitectureTo kill the latency, I used three specific systems-level techniques:Memory Mapping (mmap): Instead of reading the file into RAM, I map the file directly to the process's virtual memory address space.Pointer Arithmetic: I used C pointers to scan the raw bytes for delimiters (, and \n) rather than creating intermediate strings.Zero-Copy Aggregations: Calculations happen on the fly as the pointer moves. No DataFrames, no objects, no bloat.The Benchmarks (10 Million Rows / ~400MB CSV)I ran a simple aggregation (summing a column based on a status filter) against standard Pandas.MetricStandard PandasAxiom-CSV (C-Engine)ImprovementExecution Time10.61 seconds1.33 seconds~8x FasterPeak RAM Usage1,738 MB375 MB78% ReductionNote: The 375MB RAM usage for the C-engine is almost identical to the raw file size on disk. This is "Zero-Bloat" engineering.Why This Matters for Cloud BudgetsBy reducing the memory footprint by 78%, you can move data pipelines from expensive, high-memory AWS instances (like an r5.xlarge) to the cheapest possible instances (like a t3.micro).The result: You save thousands in infrastructure costs while your users get results 8x faster.Check the CodeI've open-sourced the C-bridge and the Python implementation here:👉 &lt;a href="https://github.com/naresh-cn2/Axiom-CSVI'm" rel="noopener noreferrer"&gt;https://github.com/naresh-cn2/Axiom-CSVI'm&lt;/a&gt; curious—for those of you handling high-throughput data, where are you seeing your biggest bottlenecks? Is it I/O, or is it the Python heap?&lt;/p&gt;

</description>
      <category>python</category>
      <category>performance</category>
      <category>dataengineering</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How I cut Python JSON memory overhead from 1.9GB to ~0MB (11x Speedup)</title>
      <dc:creator>NARESH-CN2</dc:creator>
      <pubDate>Wed, 08 Apr 2026 09:10:57 +0000</pubDate>
      <link>https://forem.com/nareshcn2/how-i-cut-python-json-memory-overhead-from-19gb-to-0mb-11x-speedup-3o8c</link>
      <guid>https://forem.com/nareshcn2/how-i-cut-python-json-memory-overhead-from-19gb-to-0mb-11x-speedup-3o8c</guid>
      <description>&lt;p&gt;The Problem: The "PyObject" TaxWe all love Python for its developer velocity, but for high-scale data engineering, the interpreter's overhead is a silent killer.I was recently benchmarking standard json.loads() on a 500MB JSON log file.The Result:⏱️ 3.20 seconds of execution time.📈 1,904 MB RAM spike.Why?Python's standard library creates a full-blown PyObject for every single key and value. When you are dealing with millions of log entries, your RAM becomes a graveyard of overhead. For a 500MB file, Python is essentially managing nearly 2GB in memory just to represent the data structures. For cloud infrastructure, this isn't just "slow"—it's an expensive AWS bill and a system crash waiting to happen.The Solution: Axiom-JSON (The C-Bridge)I decided to bypass the Python memory manager entirely for the heavy lifting. I built a bridge using:Memory Mapping ($mmap$): Instead of "loading" the file into a RAM buffer, I mapped the file's address space. The OS handles the paging, keeping the RAM footprint effectively flat regardless of file size.C Pointer Arithmetic: I used memmem to scan raw bytes directly on the disk cache. No dictionaries, no lists, no objects—until the specific data is actually needed by the Python layer.The Benchmarks (500MB JSON)MetricStandard Python (json.loads)Axiom-JSON (C-Bridge)ImprovementExecution Time3.20s0.28s$11.43\times$ FasterRAM Consumption1,904 MB$\approx 0$ MBInfinite ScalabilityThe ROI ArgumentIf you are running data pipelines on AWS or GCP, memory is usually your most expensive constraint. Moving from a 2GB RAM requirement to a few megabytes allows you to:Downgrade instance types (e.g., from memory-optimized r5.large to general-purpose t3.micro).Parallelize workers 10x more efficiently on the same hardware.$$\text{Efficiency Gain} = \frac{\text{Baseline Time}}{\text{Optimized Time}} \approx 11.4\times$$Get the CodeI have open-sourced the C engine and the Python bridge logic for anyone dealing with "Log-Bombing" issues:👉 GitHub: &lt;a href="https://github.com/naresh-cn2/Axiom-JSONNeed" rel="noopener noreferrer"&gt;https://github.com/naresh-cn2/Axiom-JSONNeed&lt;/a&gt; a Performance Audit?If your Python backend is hitting a RAM wall or your cloud compute bills are ballooning, I’m currently helping teams optimize their data architecture and build custom C-bridges.&lt;/p&gt;

</description>
      <category>python</category>
      <category>c</category>
      <category>performance</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Python was too slow for 10M rows—So I built a C-Bridge (and found the hidden data loss)</title>
      <dc:creator>NARESH-CN2</dc:creator>
      <pubDate>Tue, 07 Apr 2026 08:17:40 +0000</pubDate>
      <link>https://forem.com/nareshcn2/python-was-too-slow-for-10m-rows-so-i-built-a-c-bridge-and-found-the-hidden-data-loss-5b86</link>
      <guid>https://forem.com/nareshcn2/python-was-too-slow-for-10m-rows-so-i-built-a-c-bridge-and-found-the-hidden-data-loss-5b86</guid>
      <description>&lt;h1&gt;
  
  
  The Challenge: The 1-Second Wall
&lt;/h1&gt;

&lt;p&gt;In high-volume data engineering, "fast enough" is a moving target. I was working on a log ingestion problem: 700MB of server logs, roughly 10 million rows. &lt;/p&gt;

&lt;p&gt;Standard Python line-by-line iteration (&lt;code&gt;for line in f:&lt;/code&gt;) was hitting a consistent wall of &lt;strong&gt;1.01 seconds&lt;/strong&gt;. For a real-time security auditing pipeline, this latency was unacceptable. &lt;/p&gt;

&lt;p&gt;But speed wasn't the only problem. I discovered something worse: &lt;strong&gt;Data Loss.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Silent Killer: Boundary Splits
&lt;/h2&gt;

&lt;p&gt;Most standard parsers read files in chunks (like 8KB). If your target status code (e.g., &lt;code&gt;" 500 "&lt;/code&gt;) is physically split between two chunks in memory—say, &lt;code&gt;" 5"&lt;/code&gt; at the end of Chunk A and &lt;code&gt;"00 "&lt;/code&gt; at the start of Chunk B—the parser misses it entirely. &lt;/p&gt;

&lt;p&gt;In my dataset, standard parsing missed &lt;strong&gt;180 critical errors.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Axiom-IO (The C-Python Hybrid)
&lt;/h2&gt;

&lt;p&gt;I decided to bypass the Python interpreter's I/O overhead by building a hybrid engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Raw C Core
&lt;/h3&gt;

&lt;p&gt;Using C's &lt;code&gt;fread&lt;/code&gt;, I pull raw bytes directly into an 8,192-byte buffer. This is hardware-aligned and minimizes system calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Boundary Overlap Logic
&lt;/h3&gt;

&lt;p&gt;To solve the data loss issue, I implemented a "Slide-and-Prepend" logic. The last few bytes of every buffer read are saved and prepended to the &lt;em&gt;next&lt;/em&gt; read. This ensures that no status code is ever sliced in half.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Python Bridge
&lt;/h3&gt;

&lt;p&gt;I used &lt;code&gt;ctypes&lt;/code&gt; to create a shared library (&lt;code&gt;.so&lt;/code&gt;). This allows Python to handle the high-level orchestration while the heavy lifting happens in memory-safe C.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmarks (700MB / 10M Rows)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Execution Time&lt;/th&gt;
&lt;th&gt;Data Integrity (Errors Found)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard Python&lt;/td&gt;
&lt;td&gt;1.01s&lt;/td&gt;
&lt;td&gt;1,425,016&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Axiom-IO (Hybrid)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.20s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,425,196&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The result? A 5x speedup and 180 "Ghost" errors caught.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Sometimes, the best way to use Python is to know when to step outside of it. By aligning our software with how hardware actually reads memory, we didn't just gain speed—we gained truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source Code &amp;amp; Benchmarks:&lt;/strong&gt; &lt;a href="https://github.com/naresh-cn2/Axiom-IO-Engine" rel="noopener noreferrer"&gt;https://github.com/naresh-cn2/Axiom-IO-Engine&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhw3h1speuyg8idec2i2s.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhw3h1speuyg8idec2i2s.jpeg" alt=" " width="800" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>cpp</category>
      <category>performance</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
