DEV Community

Daniel Mutu
Daniel Mutu

Posted on

3 1

What is faster? Read file CSV or Oracle table?

Hi guys,

I need an advice. I need to Load a huge amout of data(50 M). From Oracle to HBase.
In this moment there is an Job wrote in Talend (ETL System) that read data from CSV and load to HBase.

Oracle -> CSV File -> Talend Job -> HBase Database

Can I get better upload performance if I connect to the oracle database?

Is reading from a table Oracle faster than reading from a file CSV?

Thanks,
Daniel

Top comments (2)

Collapse
 
frosnerd profile image
Frank Rosner • Edited

Can I get better upload performance if I connect to the oracle database?

Most likely, as you are anyway "connecting to the Oracle DB" to generate the CSV file. By reading directly from Oracle you save the CSV generation and parsing step. This step not only takes time but is also error-prone as all schema information is lost and all variables are converted to String.

Collapse
 
rhymes profile image
rhymes

It might depend on the load and frequency of these ETL jobs and data format.

Dumping a table allows you to decouple the extraction and insertion steps, which means extraction could be done by a serial job and insertion to the destination DB could be done in parallel. Granted this can also be accomplished by using an intermediate programming language but ETL tools are normally equipped at handling massive CSV.

Depending on how the data is it might not matter to have data type conversion in place.

Tiger Data image

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

Read more

Dappier Deep Dive: Build, Monetize, Repeat

Join us live with the team at Dappier to explore how to transform your content into AI-powered agents and unlock new revenue streams. We’ll dive into deploying branded AI assistants using Dappier’s AskAI, monetizing through conversational Agentic Ads, and syndicating your data via the Dappier Marketplace. Whether you’re a publisher, developer, or content creator, discover how to future-proof your work and thrive in the AI-driven web.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️