Forem

Streaming Audio: A Confluent podcast about Apache Kafka®

Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik

Getting data from a database management system (DBMS) into Apache Kafka® in real time is a subject of ongoing innovation. John Neal (Principal Solution Architect, Qlik) and Adam Mayer (Senior Technical Producer Marketing Manager, Qlik) explain how leveraging change data capture (CDC) for data ingestion into Kafka enables real-time data-driven insights. 

It can be challenging to ingest data in real time. It is even more challenging when you have multiple data sources, including both traditional databases and mainframes, such as SAP and Oracle. Extracting data in batch for transfer and replication purposes is slow, and often incurs significant performance penalties. However, analytical queries are often even more resource intensive and are prohibitively expensive to run on production transactional databases. CDC enables the capture of source operations as a sequence of incrementing events, converting the data into events to be written to Kafka. 

Once this data is available in the Kafka topics, it can be used for both analytical and operational use cases. Data can be consumed and modeled for analytics by individual groups across your organization. Meanwhile, the same Kafka topics can be used to help power microservice applications and help ensure data governance without impacting your production data source. Kafka makes it easy to integrate your CDC data into your data warehouses, data lake, NoSQL database, microservices, and any other system. 

Adam and John highlight a few use cases where they see real-time Kafka data ingestion, processing, and analytics moving the needle—including real-time customer predictions, supply chain optimizations, and operational reporting. Finally, Adam and John cap it off with a discussion on how capturing and tracking data changes are critical for your machine learning model to enrich data quality. 

EPISODE LINKS

Episode source