DEV Community

Judy
Judy

Posted on

2 1 1 1 1

How to operate large CSV files with esProc SPL

esProc SPL provides cursor operations, which can operate large CSV files with very simple code. With slight modifications, it can be converted into parallel computing. It also has a graphical interface, which is much more convenient than Python.

First, go here to download esProc SPL: It's free~

If you don’t want to bother with the source code, the standard edition can be used., download and install it.
Prepare a large CSV file:

Image description

Open the esProc IDE, create a new script, write SPL code in cell A1, and read the first 100 entries:

=file("d:/OrdersBig.csv").cursor@tc().fetch(100)
Enter fullscreen mode Exit fullscreen mode

The function cursor represents opening the text file with a cursor, and @ represents the function’s extension options, @t represents the first row being column names, @c represents commas as the separator.

Because it is a large file, loading it all into memory may overflow, so fetch only 100 entries and take a look.

Press ctrl-F9 to execute, and you can see the calculation result data table on the right.

Image description
SPL code is written in cells, and after each cell is executed, there will be a value, which can be seen on the right side, which brings great convenience to debugging.

Let's try the calculation. First, count the rows:

Image description
The skip function is used to skip N records and return the number of skipped records. When the parameter is empty, all records are skipped.

Image description
There are a total of 101730411 records.

Then take a look at the filtering, select records with Amount between 3000 and 4000 and Client containing s:

Image description
The select function is used for conditional filtering, the like function is used for string matching, and * is a wildcard, @c means not case sensitive.

Because there may still be many results, we only take the first 100 entries to see, and the execution result is in A3:

Image description
Sorting can also be done, such as sorting in the order of OrderDate and in the reverse order of Amount:

Image description
After sorting, write the results into a new file, then open the new file and retrieve the first 100 entries. The function sortx is used for sorting large files, where - represents reverse order.

After execution, look at the result on the right, it has been sorted.

Image description
Then do some complex calculations, group and aggregation:

Image description
SQL programmers must be familiar with the groups function in A4, so we won't go into detail here.

A2 can read only the columns to be used when opening the cursor, which can improve speed. A1 and A5 have also been added here to track the running time and print it on the console:

Image description
You can see that the running time is 145 seconds.

Parallel computing can fully utilize current multi-core CPUs and must be tried. Change the above code to parallel computing, simply add an option @m after the cursor function, and leave the rest unchanged:

Image description
@m represents performing multi-threaded computation according to the parallel options configured in the option.

At the same time turn on this parallel option.

Image description
Now execute it again:

Image description
Improved to 92 seconds, possibly due to the concurrency limitation of the hard disk, it is not possible to achieve a multiple performance improvement, and the results of running with 2 threads are also similar.
The above are a few basic operations. The official website provides more extensive and in-depth examples of large CSV calculations, and the code is also very simple. It is worth taking a look.

ACI image

ACI.dev: The Only MCP Server Your AI Agents Need

ACI.dev’s open-source tool-use platform and Unified MCP Server turns 600+ functions into two simple MCP tools on one server—search and execute. Comes with multi-tenant auth and natural-language permission scopes. 100% open-source under Apache 2.0.

Star our GitHub!

Top comments (4)

Collapse
 
tullis12 profile image
Tullis

Thanks for the detailed walkthrough using esProc SPL with large CSV files—it’s always refreshing to see a tool that doesn’t require writing a novel just to process some data! The cell-based approach and the graphical interface look quite user-friendly, especially for quick debugging (my eyesight thanks you). I do appreciate the transparency about parallel processing not quite working magic because of disk limitations—somehow the hardware never wants to cooperate as much as we’d like! My only minor nitpick is that some code explanations felt a bit brisk, and a slightly deeper dive into the syntax or alternatives for newbies would help. Still, if I have to wrangle a massive CSV again, this might save me some headaches (and a few CPU fans). Thanks for sharing!

Collapse
 
esproc_spl profile image
Judy

Hello Tullis! Great! Welcome to try it out. You can contact me at any time if you have any questions.

Collapse
 
nevodavid profile image
Nevo David

feels like this would've saved me so much headache in the past tbh - you think this kind of tool really changes your workflow over time or do old habits win out?

Collapse
 
esproc_spl profile image
Judy

The advancement of technology will gradually change people's habits, won't it,David?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Tiger Data image

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

Read more

👋 Kindness is contagious

Dive into this engaging article on the vibrant DEV Community—all devs are welcome to share insights and boost our collective expertise.

A quick “thank you” can brighten someone’s day—drop your kudos in the comments!

In DEV, knowledge-sharing forges stronger bonds. Enjoyed this read? A small nod of appreciation for the author speaks volumes.

Join the conversation