esProc SPL provides cursor operations, which can operate large CSV files with very simple code. With slight modifications, it can be converted into parallel computing. It also has a graphical interface, which is much more convenient than Python.
First, go here to download esProc SPL: It's free~
If you don’t want to bother with the source code, the standard edition can be used., download and install it.
Prepare a large CSV file:
Open the esProc IDE, create a new script, write SPL code in cell A1, and read the first 100 entries:
=file("d:/OrdersBig.csv").cursor@tc().fetch(100)
The function cursor represents opening the text file with a cursor, and @ represents the function’s extension options, @t represents the first row being column names, @c represents commas as the separator.
Because it is a large file, loading it all into memory may overflow, so fetch only 100 entries and take a look.
Press ctrl-F9 to execute, and you can see the calculation result data table on the right.
SPL code is written in cells, and after each cell is executed, there will be a value, which can be seen on the right side, which brings great convenience to debugging.
Let's try the calculation. First, count the rows:
The skip function is used to skip N records and return the number of skipped records. When the parameter is empty, all records are skipped.
There are a total of 101730411 records.
Then take a look at the filtering, select records with Amount between 3000 and 4000 and Client containing s:
The select function is used for conditional filtering, the like function is used for string matching, and * is a wildcard, @c means not case sensitive.
Because there may still be many results, we only take the first 100 entries to see, and the execution result is in A3:
Sorting can also be done, such as sorting in the order of OrderDate and in the reverse order of Amount:
After sorting, write the results into a new file, then open the new file and retrieve the first 100 entries. The function sortx is used for sorting large files, where - represents reverse order.
After execution, look at the result on the right, it has been sorted.
Then do some complex calculations, group and aggregation:
SQL programmers must be familiar with the groups function in A4, so we won't go into detail here.
A2 can read only the columns to be used when opening the cursor, which can improve speed. A1 and A5 have also been added here to track the running time and print it on the console:
You can see that the running time is 145 seconds.
Parallel computing can fully utilize current multi-core CPUs and must be tried. Change the above code to parallel computing, simply add an option @m after the cursor function, and leave the rest unchanged:
@m represents performing multi-threaded computation according to the parallel options configured in the option.
At the same time turn on this parallel option.
Improved to 92 seconds, possibly due to the concurrency limitation of the hard disk, it is not possible to achieve a multiple performance improvement, and the results of running with 2 threads are also similar.
The above are a few basic operations. The official website provides more extensive and in-depth examples of large CSV calculations, and the code is also very simple. It is worth taking a look.
Top comments (4)
Thanks for the detailed walkthrough using esProc SPL with large CSV files—it’s always refreshing to see a tool that doesn’t require writing a novel just to process some data! The cell-based approach and the graphical interface look quite user-friendly, especially for quick debugging (my eyesight thanks you). I do appreciate the transparency about parallel processing not quite working magic because of disk limitations—somehow the hardware never wants to cooperate as much as we’d like! My only minor nitpick is that some code explanations felt a bit brisk, and a slightly deeper dive into the syntax or alternatives for newbies would help. Still, if I have to wrangle a massive CSV again, this might save me some headaches (and a few CPU fans). Thanks for sharing!
Hello Tullis! Great! Welcome to try it out. You can contact me at any time if you have any questions.
feels like this would've saved me so much headache in the past tbh - you think this kind of tool really changes your workflow over time or do old habits win out?
The advancement of technology will gradually change people's habits, won't it,David?
Some comments may only be visible to logged-in visitors. Sign in to view all comments.