DEV Community

Cover image for Batch Geocode Addresses in Python with Geoapify API
Geoapify
Geoapify

Posted on

1

Batch Geocode Addresses in Python with Geoapify API

If you’re working with location data, geocoding — the process of converting addresses into geographic coordinates — is often a key step. While many APIs offer geocoding, handling large lists of addresses while respecting rate limits can be a challenge.

In this article, we’ll walk through a Python script that solves exactly this problem using Geoapify’s Geocoding API. We’ll read addresses from a file, process them in rate-limited batches, and write the results to a newline-delimited JSON (NDJSON) file.

🔗 GitHub Repository: geoapify/maps-api-code-samples


🧩 What This Script Does

This script:

  • Reads a list of addresses from a file.
  • Sends asynchronous requests to the Geoapify Geocoding API.
  • Respects API rate limits (5 requests/second).
  • Optionally filters geocoding by country.
  • Saves results to a file in NDJSON format.

Perfect for developers processing large CSV/Excel exports or building internal tools with address lookups.


📥 1. Reading Addresses from File

with open(input_file, 'r') as f:
    addresses = f.read().strip().splitlines()
Enter fullscreen mode Exit fullscreen mode

What it does:
Reads the input file line by line, strips extra whitespace, and stores the addresses in a list.

Why it’s needed:
Prepares a clean list of addresses for batch processing. Each line in the input file represents a separate address.

📚 Docs:

🧮 2. Batching Requests According to Rate Limit

addresses = list(it.batched(addresses, REQUESTS_PER_SECOND))
Enter fullscreen mode Exit fullscreen mode

What it does:
Splits the address list into smaller batches, each containing REQUESTS_PER_SECOND number of addresses (e.g. 5 per batch).

Why it’s needed:
Geoapify enforces a maximum number of API requests per second. Batching ensures we never send more than the allowed number of requests per second.

📚 Docs:

📝 If you're using Python < 3.12, see how to implement your own batching function here.

🚀 3. Asynchronous Execution of Requests

tasks = []
with ThreadPoolExecutor(max_workers=10) as executor:
   for batch in addresses:
       logger.info(batch)
       tasks.extend([executor.submit(geocode_address, address, api_key, country_code) for address in batch])
       sleep(1)
Enter fullscreen mode Exit fullscreen mode

What it does:

  • Uses a thread pool to send multiple requests in parallel.
  • Submits one thread per address.
  • Waits 1 second between batches to comply with Geoapify's rate limit.

Why it’s needed:
Parallelism accelerates processing by making multiple requests simultaneously. sleep(1) ensures the API's request-per-second quota isn’t exceeded.

📚 Docs:

🌍 4. Geocoding Function

def geocode_address(address, api_key, country_code):
    params = {
        'format': 'json',
        'text': address,
        'limit': 1,
        'apiKey': api_key
    }
    if country_code:
        params['filter'] = 'countrycode:' + country_code

    try:
        response = requests.get(GEOAPIFY_API_URL, params=params)
        if response.status_code == 200:
            data = response.json()
            if len(data['results']) > 0:
                return data['results'][0]
            else:
                return { "error":  "Not found" }
        else:
            logger.warning(f"Failed to geocode address '{address}': {response_data}")
            return {}
    except Exception as e:
        logger.error(f"Error while geocoding address '{address}': {e}")
        return {}
Enter fullscreen mode Exit fullscreen mode

What it does:
Sends a request to the Geoapify Geocoding API with the given address and optional country code.
Parses the response and returns the top geocoding result as a dictionary. If no result is found, or an error occurs, it returns a fallback dictionary with an error message.

Why it’s needed:
Encapsulates the geocoding logic in a reusable function. Handles:

  • URL building and query parameters,
  • Optional filtering by country for accuracy,
  • Error handling and logging,
  • Response validation.

📚 Docs:

⏳ 5. Waiting for All Requests to Complete

wait(tasks, return_when=ALL_COMPLETED)
results = [task.result() for task in tasks]
Enter fullscreen mode Exit fullscreen mode

What it does:
Blocks until all geocoding requests have completed, then collects results into a list.

Why it’s needed:
Ensures that all asynchronous jobs finish before the output is saved. Prevents writing incomplete or partial results.

📚 Docs:

📝 6. Writing Results to NDJSON File

with open(output_file, 'w') as f:
    for result in results:
        f.write(json.dumps(result) + '\n')
Enter fullscreen mode Exit fullscreen mode

What it does:
Writes results as newline-delimited JSON objects to a file — a format known as NDJSON.

Why it’s needed:
NDJSON is ideal for large-scale processing. It’s readable line-by-line, can be streamed, and integrates well with tools like jq, Elasticsearch, and data pipelines.

📚 Docs:


▶️ How to Use It

1. Save your addresses to a .txt file (one per line):

1600 Amphitheatre Parkway, Mountain View, CA  
Eiffel Tower, Paris  
Brandenburger Tor, Berlin
Enter fullscreen mode Exit fullscreen mode

2. Run the script:

python geocode_addresses.py \
  --api_key=YOUR_GEOAPIFY_API_KEY \
  --input=addresses.txt \
  --output=results.ndjson \
  --country_code=us
Enter fullscreen mode Exit fullscreen mode
  • --api_key: Your Geoapify API key.
  • --input: Input file containing addresses.
  • --output: Output file in NDJSON format.
  • --country_code: Optional ISO country code (e.g., us, fr, de) to increase accuracy.

🛠️ Requirements

Only Python standard library is used — no extra installs needed.

Python 3.12+ is required for itertools.batched.
If you’re on an older version, you can define your own batching function:

def batched(iterable, n):
    it = iter(iterable)
    while True:
        batch = list(itertools.islice(it, n))
        if not batch:
            break
        yield batch
Enter fullscreen mode Exit fullscreen mode

📦 Use Case Scenarios

  • Clean and validate customer address lists.
  • Pre-process logistics/delivery points.
  • Enrich event registration data with geo-coordinates.

🔍 Conclusion

With just a few lines of Python, you can build a robust geocoding pipeline that respects API rate limits and scales to thousands of addresses. This script is a great foundation you can extend with features like:

  • Retry logic
  • Address deduplication
  • Integration with Pandas or Google Sheets

Try it yourself — and happy geocoding!

$150K MiniMax AI Agent Challenge — Build Smarter, Remix Bolder, Win Bigger!

Join the $150k MiniMax AI Agent Challenge — Build your first AI Agent 🤖

Developers, innovators, and AI tinkerers, build your AI Agent and win $150,000 in cash. 💰

Read more →

Top comments (0)

Your Python stack deserves better infra

Your Python stack deserves better infra

Stop duct-taping user flows together. Manage auth, access, and billing in one simple SDK with Kinde.

Get a free account

👋 Kindness is contagious

Delve into a trove of insights in this thoughtful post, celebrated by the welcoming DEV Community. Programmers of every stripe are encouraged to share their viewpoints and expand our collective expertise.

A simple “thank you” can brighten someone’s day—drop yours in the comments below!

On DEV, exchanging knowledge lightens our path and forges deeper connections. If you found this valuable, a quick note of gratitude to the author goes a long way.

Get Started