DEV Community

Cover image for Why I Stopped Writing “Just Another CSV Script” for Every Project
Abrar ahmed
Abrar ahmed

Posted on

1 1 1 1 1

Why I Stopped Writing “Just Another CSV Script” for Every Project

Every project starts the same way:

  1. - Client sends a messy CSV file
  2. - I write a quick script to clean it
  3. - A week later… they send another file, slightly different
  4. - I tweak the script again
  5. - Repeat until I'm buried in tiny, fragile one-off scripts

Sound familiar?

In the past, I treated CSV cleaning like it was a minor task—just whip up some Node.js, make the necessary fixes, and then get on with my day.

The Problem With One-Off Scripts

One-time scripts are fast to write and easy to forget. But they come back to haunt you when:

  • A client changes the column order or headers
  • You forget which script handles which format
  • Someone else needs to run it—and it only works on your machine
  • You end up repeating the same logic across 10 files

I was solving the same problems repeatedly:

  • Normalize inconsistent column names
  • Convert date formats
  • Drop blank or duplicate rows
  • Handle different encodings (UTF-8 with BOMs… hello darkness)
  • Export the cleaned result

I didn’t need more scripts. I needed structure.

What I Do Now Instead

These days, when I come across a messy new file, I don’t just dive in from the beginning.

I’ve developed a handy approach that breaks things down into small, testable parts.

  • input parsers (CSV, Excel, JSON)
  • a normalization layer (headers, encodings)
  • a transformation layer (date formatting, filters, maps)
  • an output formatter (CSV, JSON, preview) This isn’t a framework. It’s just a mindset: Write it once → reuse it forever.

Example: Simple Modular Cleanup in Node.js

Instead of one giant script, I use small utilities like these:
parser.js

const fs = require("fs");
const csv = require("csv-parser");

function parseCSV(filePath) {
  return new Promise((resolve, reject) => {
    const results = [];
    fs.createReadStream(filePath)
      .pipe(csv())
      .on("data", (row) => results.push(row))
      .on("end", () => resolve(results))
      .on("error", reject);
  });
}

module.exports = { parseCSV };
Enter fullscreen mode Exit fullscreen mode

cleaner.js

function cleanRows(data) {
  return data
    .filter(row => Object.values(row).some(val => val !== ""))
    .map(row => ({
      ...row,
      date: new Date(row.date).toISOString().split("T")[0], // Normalize date
      name: row.name?.trim(), // Clean string
    }));
}

module.exports = { cleanRows };
Enter fullscreen mode Exit fullscreen mode

exporter.js

const { writeFileSync } = require("fs");

function exportCSV(data, path) {
  const header = Object.keys(data[0]).join(",");
  const rows = data.map(obj => Object.values(obj).join(",")).join("\n");
  writeFileSync(path, `${header}\n${rows}`, "utf8");
}

module.exports = { exportCSV };
Enter fullscreen mode Exit fullscreen mode

main.js

const { parseCSV } = require("./parser");
const { cleanRows } = require("./cleaner");
const { exportCSV } = require("./exporter");

async function runCleanup() {
  const raw = await parseCSV("dirty.csv");
  const cleaned = cleanRows(raw);
  exportCSV(cleaned, "cleaned.csv");
}

runCleanup();
Enter fullscreen mode Exit fullscreen mode

Now, whenever I receive a new file, I simply adjust my cleaner.js logic—no need to start from square one anymore.

Benefits of Moving Away From “Just Scripts”

  • Less copy-paste, more confidence
  • Easier to onboard clients or teammates
  • Faster debugging (you know where the logic lives)
  • Fewer edge-case surprises
  • Scales from a 100-row file to 1 million+ rows Now when I get a weird file with 12 columns, 3 date formats, and 2 “LOL” rows… I know my workflow can handle it.

Takeaways for Devs Handling Messy Data

  • Your first script should solve the problem
  • Your second should solve the pattern
  • Your third should become a system

If you're still writing one-off scripts for every client file:
no shame — we've all done it
but long term, it's pain on repeat

If you’ve already moved to a modular, testable data-cleaning setup, I’d love to hear how you approached it

Heroku

Build AI apps faster with Heroku.

Heroku makes it easy to build with AI, without the complexity of managing your own AI services. Access leading AI models and build faster with Managed Inference and Agents, and extend your AI with MCP.

Get Started

Top comments (0)