How to Merge Multiple Excel Files into One with Python

Victor Feitosa — Thu, 21 May 2026 01:43:56 +0000

Every month, the same thing happens in offices everywhere:
someone has to open 12 monthly reports and manually copy everything into a master file.

I built a Python tool that eliminates this entirely.

## What it does

python merge_excel.py ./reports

It finds every .xlsx file in the folder, reads them all, stacks the data,
and exports one clean formatted spreadsheet — with a "Source File" column
showing exactly where each row came from.

## The core — pd.concat

The key function is pandas' concat. It stacks DataFrames from different files
even when columns don't match perfectly — missing columns are filled automatically.

import pandas as pd
from pathlib import Path

def merge_files(folder):
    files = sorted(Path(folder).glob("*.xlsx"))
    frames = []

    for f in files:
        df = pd.read_excel(f, engine="openpyxl")
        df.insert(0, "Source File", f.name)
        frames.append(df)

    merged = pd.concat(frames, ignore_index=True, sort=False)
    merged.fillna("", inplace=True)
    return merged

That's the core. The rest is formatting and CLI handling.

## The full tool

The complete script adds:
- Auto-detection of files (folder or explicit list)
- Professional Excel formatting (headers, alternating rows, frozen pane)
- Clean terminal output showing progress

GitHub: github.com/grey-pv/excel-merger

## What's next

This is project #2 of a Python automation toolkit I'm building in public.
Each project solves one specific data problem. Next up: CSV Cleaner.

I built a Python tool that extracts tables from PDFs and exports to Excel automatically

Victor Feitosa — Sat, 16 May 2026 23:51:29 +0000

Every week I saw the same thing at work: someone opening a PDF, manually copying a table into Excel, cell by cell. 30 minutes of work that should take 2 seconds.

So I built a tool to fix it.

What it does

python extract_tables.py financial_report.pdf

That's it. You get a formatted .xlsx file with every table from the PDF, organized into separate sheets with styled headers, alternating rows, and auto-fitted columns.

The tech stack

Three libraries. That's all:

pdfplumber — opens the PDF and extracts table data page by page
pandas — structures the raw data into DataFrames
openpyxl — writes to Excel with formatting

No OCR. No Java. No external APIs. Pure Python.

The core extraction — 15 lines

import pdfplumber
import pandas as pd

def extract_tables_from_pdf(pdf_path):
    results = []
    with pdfplumber.open(pdf_path) as pdf:
        for page_num, page in enumerate(pdf.pages, start=1):
            tables = page.extract_tables()
            for t_idx, raw_table in enumerate(tables, start=1):
                header = raw_table[0]
                rows = raw_table[1:]
                df = pd.DataFrame(rows, columns=header)
                results.append({"page": page_num, "table_index": t_idx, "dataframe": df})
    return results

pdfplumber does the heavy lifting. It detects table boundaries from the PDF's internal structure — no guessing, no heuristics for basic cases.

Why it's useful for real work

PDF reports are everywhere: government data, supplier invoices, financial statements, logistics documents. Most of them have tables. None of them are easy to work with.

This tool is for the analyst who gets a 20-page PDF every Monday morning and needs the data in Excel by 9am.

Get the full code

GitHub: github.com/grey-pv/pdf-to-excel

Includes the full CLI script, a sample PDF generator for testing, and instructions to run it in under 2 minutes.

What's coming next

Batch mode for processing multiple PDFs at once
Streamlit UI for non-technical users
OCR support for scanned PDFs

This is part of a Python automation toolkit I'm building in public. If you work with PDFs regularly, try it and let me know what breaks — real-world PDFs are always messier than test cases.

Forem: Victor Feitosa