DEV Community

Cover image for Speeding up geodata processing with feather
Sophia Parafina
Sophia Parafina

Posted on

Speeding up geodata processing with feather

Previously, on speeding up geodata processing...

In this post, I compare the read and write performance of the feather file format against the pickle file format.

From Hadley Wickham's blog:

What is Feather?

Feather is a fast, lightweight, and easy-to-use binary file format for storing data frames. It has a few specific design goals:

  • Lightweight, minimal API: make pushing data frames in and out of memory as simple as possible
  • Language agnostic: Feather files are the same whether written by Python or R code. Other languages can read and write Feather files, too.
  • High read and write performance. When possible, Feather operations should be bound by local disk performance.

Geopandas has supported the feather format since version 0.8, and the test used version 0.10.

import geopandas as gpd
import time
import pickle
from pyogrio import read_dataframe
import warnings; warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')

# read shapefile
read_start = time.process_time()
data = read_dataframe("Streets.shp")
read_end = time.process_time()

# write feather test
write_start = time.process_time()
data.to_feather('test_feather.feather', 'wb')
write_end = time.process_time()

write_time = write_end - write_start
print(str(write_time/60)+" minutes to write feather file")

# read feather test
read_start = time.process_time()
csv_feather_df = pd.read_feather('csv_feather.feather')
read_end = time.process_time()

write_time = read_end - read_start
print(str(write_time/60)+" minutes to read feather file")
Enter fullscreen mode Exit fullscreen mode

Results

r/w minutes pickle feather
read 0.92 1.07
write 4.36 1.69

Read times are comparable, but write times 4x faster. The longer write time is probably caused by converting geometry to a Well Known Binary, which is compatible with the feather format. The caveat is that the feather format is subject to change, as evidenced by the import ignore warning.

Thoughts

If your data is static or distributed, the pickle format may be better. Feather may be the right choice if you need to transfer geodata within a processing workflow with a file, e.g., from Python to R.

Image of Quadratic

The best Excel alternative with Python built-in

Quadratic is the all-in-one, browser-based AI spreadsheet that goes beyond traditional formulas for powerful visualizations and fast analysis.

Try Quadratic free

Top comments (0)

AWS Q Developer image

Your AI Code Assistant

Q Developer CLI agent provides a lightning-fast coding experience that can read and write files locally, call AWS APIs, run bash commands, or create code—all while adapting to your feedback in real-time.

Start coding for free

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay