Forem: Aimé Bangirahe

Kafka consumer lag—Measure and reduce

Aimé Bangirahe — Mon, 10 Nov 2025 19:39:45 +0000

Kafka consumer lag

Apache Kafka® is an open-source distributed streaming platform that implements applications that rely on real-time event processing and durable event storage. Event processing-based application architecture helps architects to decouple application components and scale them independently. While decoupling helps in scalability and resilience, it also makes the architecture complex to debug.

Optimizing performance for distributed applications requires considerable engineering effort and Kafka is no different. Kafka consumer lag — which measures the delay between a Kafka producer and consumer — is a key Kafka performance indicator. This article explores Kafka consumer lag in detail, including causes, monitoring, and strategies to address it.

Summary of key Kafka consumer lag concepts

Kafka consumer lag is a key performance indicator for the popular Kafka streaming platform. All else equal, lower consumer lag means better Kafka performance. The table below summarizes common causes of Kafka consumer lag. We will explore these causes in more detail later in this article.

What is Kafka consumer lag?

Kafka consumer lag is the difference between the last message produced by the producer and the offset committed by the consumer group. It represents the consumer processing delay.

Understanding the Kafka working model
Kafka excels in acting as a foundation for building decoupled applications that rely on event processing. It does its magic through the concept of producers and consumers. Producers are applications that send events to the Kafka broker. The broker stores the messages durably and enables the client applications to process the events logically.

Data is logically separated inside brokers using Kafka topics. Topics are categories that logically separate data so it can be uniquely addressed. Topic names are unique across a Kafka cluster. Topics are further divided into partitions to facilitate scaling. Partitions keep a subset of data belonging to a topic.

When a producer writes a message to a topic, Kafka broker writes into a partition that belongs to the topic. Kafka maintains the progress of writing data to each partition by tracking the last position of each data write. This position is called long-end offset. It is a partition-specific offset.

Consumers contain application logic about how to process the data written to partitions. To facilitate scaling within consumers, Kafka uses the concept of consumer groups. A consumer group is a set of consumers collaborating to consume messages from the same topic. Kafka ensures consumers belonging to the same consumer group receive messages from different partitions.

When a new consumer joins the group, Kafka rebalances the members in that consumer group to ensure that the new consumer gets a fair share of assigned partitions. Every rebalance operation results in new group configurations. Group configuration here means the assignment of consumers to various partitions.

Kafka message processing can be scaled by adding more consumers to a consumer group. To enable resilience, Kafka consumers keep track of the last position in a partition from where it is read. This helps consumers to begin again from the position they left off in case of unfortunate situations like crashes. This is called consumer offset. Consumer offset is stored in a separate Kafka topic.

The difference between the last offset stored by the broker and the last committed offset for that partition is called consumer lag. It defines the gap between consumers and producers. Consumer lag provides information about the real-time performance of the processing system. A positive value of consumer lag often flags up a sudden spike in traffic, skewed data patterns, a scaling problem, or even a code-level issue.

Reasons for Kafka consumer lag

Consumer lag can occur because of several internal and external factors. Even a healthy Kafka cluster will have some consumer lag at times. As long as the lag goes down in a reasonable time, there is nothing to worry about. The lag becomes alarming when it does not decrease or show signs of a gradual increase.

Incoming traffic surge
Traffic patterns often vary through a wide range based on external patterns. For example, imagine an IoT sensor system that sends alerts based on specific external environment variables. A change in the external environment for a set of customers can flood the topic with sudden spikes. Consumers will have difficulty dealing with the sudden spike, and the lag can become alarmingly high. Manual scaling helps address Kafka consumer lag in these cases.

Data skew in partitions
Partitions bring parallelism to Kafka. Consumers within a consumer group are mapped to specific partitions. The idea is that each consumer has enough resources to handle messages coming to that partition. But data is often not uniformly distributed in partitions. Kafka provides multiple strategies to select partitions while writing data. The simplest is robin assignment, where data is uniformly distributed. But round robin is unsuitable for applications that maintain state or order. In such cases, an application-specific partition key is used.

If the partition key does not distribute data uniformly, some partitions can have more data than others. Imagine a unique customer identity is mapped to a partition key. If a specific customer sends more data than others, that partition will experience a skew leading to consumer lag.

Slow processing jobs
Consumers process the messages pulled from the partitions according to application logic. Application logic can contain tasks like complex data transformations, external microservice access, database writes, etc. Such processing mechanisms are time-consuming and can get stuck due to external factors. Imagine a consumer that accesses an external microservice to complete its task. If the response time of the external service increases because of other factors, Kafka will experience consumer lag.

Error in code and pipeline components
Kafka consumers often contain complex application logic. Like any code, that logic can have bugs. For example, a processing module can go into an infinite loop or use inefficient algorithms. Similarly, improper handling of an erroneous or unexpected input message can slow a particular consumer. Such instances will result in consumer lag.

Monitoring Kafka consumer lag

Monitoring Kafka consumer lag helps developers take corrective actions to stabilize the cluster and optimize performance. Typically, there will always be a lag because batching and lag values vary from partition to partition. Slight lag is not a significant problem if it is stable. But lag with a tendency to increase points to a problem. This section details how teams can monitor consumer lag to identify potential issues.

Monitoring Kafka consumer lag with the consumer group script

The Kafka consumer group script exposes key details about the consumer group performance. It details each partition’s ‘current offset’, ‘log end offset’, and lag. The ‘current offset’ of a partition is the last committed message of the consumer dealing with that partition. The ‘log end offset' is the highest offset in that partition. In other words, it represents the offset of the last message written to that partition. The difference between these two is the amount of consumer lag.

You can use the command below to get consumer lag details with the consumer group script.

$KAFKA_HOME/bin/kafka-consumer-groups.sh  --bootstrap-server <> --describe --group <group_name>

Executing this in a live server will result in the below output.

GROUP          TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG    OWNER
ub-kf          test-topic      0          15              17              2      ub-kf-1/127.0.0.1  
ub-kf          test-topic      1          14              15              1         ub-kf-2/127.0.0.1

Conclusion

Consumer lag is a key metric that provides information about the extent of catchup consumers must do to achieve near real-time operation. While a little bit of consumer lag is inevitable, an increasing consumer lag points to a problem in data distribution, code, and traffic patterns.

A Beginner’s Guide to Big Data Analytics with Apache Spark and PySpark

Aimé Bangirahe — Mon, 29 Sep 2025 20:52:24 +0000

What is Apache Spark?

Apache Spark
is a distributed processing system used to perform big data and machine learning tasks on large datasets. With Apache Spark, users can run queries and machine learning workflows on petabytes of data, which is impossible to do on your local device.

This framework is even faster than previous data processing engines like Hadoop, and has increased in popularity in the past eight years. Companies like IBM, Amazon, and Yahoo are using Apache Spark as their computational framework.

What is PySpark?
PySpark is an interface for Apache Spark in Python. With PySpark, you can write Python and SQL-like commands to manipulate and analyze data in a distributed processing environment. Using PySpark, data scientists manipulate data, build machine learning pipelines, and tune models.

Why Use PySpark?
The reason companies choose to use a framework like PySpark is because of how quickly it can process big data. It is faster than libraries like Pandas and Dask, and can handle larger amounts of data than these frameworks. If you had over petabytes of data to process, for instance, Pandas and Dask would fail but PySpark would be able to handle it easily.

Spark, a unified big data analytics tool with over 32k stars and 1800 contributors on Github, was created specifically for handling big data using clustered computing. It has built-in machine learning algorithms, SQL, and data streaming modules. It provides high-level APIs for R, Python, Java, and Scala. Additionally, Spark provides a wide range of high-level tools, such as Spark Streaming, MLlib for machine learning, GraphX for processing graph data sets, and Spark SQL for real-time processing of structured and unstructured data. Both stream and batch real-time processing are supported. Since Spark is an open-source platform with many built-in features, it can be applied to any industry that uses big data and data science.

PySpark Applications-How are Businesses leveraging PySpark?

Some of the industry giants like Yahoo, Netflix, etc., are leveraging various functionalities of Spark PySpark. Yahoo utilizes Apache Spark's Machine Learning capabilities to customize its news, web pages, and advertising. They use Pypark to determine what kind of news readers are interested in reading. Also, PySpark helps them categorize the news stories to determine who would be interested in reading each news category.

Netflix amazes its users with fantastic recommendations every time they watch they are using the platform. But how does this happen? Netflix uses the collaborative filtering feature offered by PySpark. Apart from that, Runtastic also uses Spark PySpark for Big Data sanity checks. Their team uses Python's unittest package to keep things simple and manageable and creates a task for each entity type (e.g., sports activities).

The PySpark Architecture

The PySpark architecture consists of various parts such as Spark Conf, RDDs, Spark Context, Dataframes, etc.

Spark Conf
SparkConf helps establish a few setups and parameters to execute a Spark application on the local cluster or a dataset. It contains the settings needed to launch a Spark application.

There are plenty of SparkConf features you can use, such as-

To set a configuration property, use set(key, value).
To set the master URL, use setMaster(value).
To name an application, use setAppName(value).
To access the configuration value, use get(key, defaultValue=None).
To set the Spark installation path on worker nodes, use setSparkHome(value).

Any Spark program begins by generating a SparkContext object, instructing the application to connect to a cluster. You must implement SparkConf to store the application's configuration data in the SparkContext instance to finish the job. Let's have a look at what SparkContext can do.

Spark Context
The Apache Spark functionality is accessed through the SparkContext portal. In any Spark operator program, the production of SparkContext is a significantly more crucial phase. The Resource Manager (YARN/Mesos) allows the Spark Application to interact with the Spark Cluster. SparkContext can only be generated once SparkConf is formed. The Spark driver program can transmit a configuration parameter to the SparkContext via SparkConf.

When you execute a Spark application, a driver program with the primary function runs, and your SparkContext is created here. The driver program then performs the operations inside the executors on worker nodes.

Resilient Distributed Databases - RDDs
The components that run and operate on numerous nodes to execute parallel processing on a cluster are RDDs (Resilient Distributed Datasets). RDDs are irrevocable entities, which means you can't change them after creation. RDDs are also fault-tolerant; thus, they will automatically recover in the event of a failure.

RDD is an acronym for-

Resilient- It is fault-tolerant and capable of regenerating data in the event of a failure.

Distributed- The data in a cluster is distributed among the various nodes.

Dataset- It refers to a collection of segregated data that contains values.

RDD uses a key to partition data into smaller chunks. The advantage of breaking data into manageable blocks is that if one executor node crashes, another node can still process the data. Because the same data blocks are duplicated over numerous executor nodes, these can quickly recover from any failures. RDD allows you to efficiently conduct functional calculations against a dataset by linking together multiple nodes.

To generate RDDs, PySpark offers two choices: loading an external dataset or distributing a set of objects. The most straightforward approach to building RDDs is the parallelize() function, which takes an existing program collection and passes it to the Spark Context.

There are mainly two types of operations you can perform with RDDs-

Transformations: These operations are used to generate a new RDD. Map, flatMap, filter, distinct, reduceByKey, mapPartitions, and sortByKey are some of the transformation operations used on RDDs.

Actions: These operations are used on an RDD to enable Apache Spark to perform computations and then provide the result to the driver. Collect, collectAsMap, reduce, countByKey/countByValue are some of the action operations used on RDDs.

PySpark SQL and Dataframes
A dataframe is a shared collection of organized or semi-structured data in PySpark. This collection of data is kept in Dataframe in rows with named columns, similar to relational database tables. It also has several characteristics in common with RDD, such as being immutable, distributed, and following lazy evaluations. It accepts various file types, including JSON, CSV, TXT, and others. You can load it from existing RDDs or by setting the schema dynamically.

Apache Kafka Deep Dive : concepts fondamentaux, applications d'ingénierie des données et pratiques de production concrètes

Aimé Bangirahe — Tue, 23 Sep 2025 16:14:03 +0000

Apache Kafka est une plate-forme de streaming d'événements distribuée open source utilisée par des milliers d'entreprises pour les pipelines de données hautes performances, l'analyse en continu, l'intégration de données et les applications critiques.

Cas d'utilisation

Kafka est utile dans divers cas d’utilisation opérationnels et d’analyse de données réels.

•Messagerie : Ce domaine possède ses propres logiciels spécialisés, comme RabbitMQ et ActiveMQ, mais Kafka est souvent suffisant pour le gérer tout en offrant de grandes performances.

• Suivi de l'activité du site Web : Kafka peut gérer de petits enregistrements de données fréquemment générés, tels que les pages vues, les actions des utilisateurs et d'autres activités de navigation sur le Web.

• Métriques : vous pouvez facilement consolider et regrouper des données qui peuvent être triées à l’aide de rubriques.

• Agrégation de journaux : Kafka permet de collecter des journaux provenant de différentes sources et de les agréger en un seul endroit dans un format unique.

• Traitement de flux : les pipelines de streaming sont l’une des fonctionnalités les plus importantes de Kafka, permettant de traiter et de transformer les données en transit.

Architecture pilotée par événements : les applications peuvent publier et réagir aux événements de manière asynchrone, ce qui permet aux événements d'une partie de votre système de déclencher facilement un comportement ailleurs. Par exemple, l'achat d'un article par un client dans votre magasin peut déclencher des mises à jour de stock, des avis d'expédition, etc.

Composants architecturaux

Voici les composants de haut niveau les plus essentiels de Kafka :

Producteur d'enregistrements
Consommateur
Courtier
Sujet
Partitionnement
Réplication
ZooKeeper ou Contrôleur Quorum

1.Enregistrement.
Également appelé événement ou message, un enregistrement est un tableau d'octets pouvant stocker n'importe quel objet, quel que soit son format. Par exemple, un enregistrement JSON décrivant le lien sur lequel un utilisateur a cliqué sur votre site web.
Il est parfois nécessaire de distribuer certains types d'événements à un groupe de consommateurs ; chaque événement sera donc distribué à un seul consommateur de ce groupe. Kafka permet de définir des groupes de consommateurs de cette manière.
Une approche de conception essentielle consiste à ce qu'aucune autre interconnexion n'ait lieu entre les clients, hormis les groupes de consommateurs. Les producteurs et les consommateurs sont totalement découplés et indépendants les uns des autres.

2.Producteur :
Un producteur est une application cliente qui publie des enregistrements (écritures) dans Kafka. Par exemple, un extrait de code JavaScript sur un site web suit le comportement de navigation sur le site et l'envoie au cluster Kafka.
3.Consommateur: Un consommateur est une application cliente qui s'abonne aux enregistrements de Kafka (c'est-à-dire qui les lit), par exemple une application qui reçoit des données de navigation et les charge dans une plateforme de données pour analyse.

4.Broker : un broker est un serveur qui gère les requêtes des clients producteurs et consommateurs et assure la réplication des données au sein du cluster. Autrement dit, un broker est l'une des machines physiques sur lesquelles Kafka s'exécute.

5.Sujet : un sujet est une catégorie permettant d'organiser les messages. Les producteurs envoient des messages à un sujet, tandis que les consommateurs s'abonnent aux sujets pertinents, ne voyant ainsi que les enregistrements qui les intéressent réellement.

6.Partitionnement : Le partitionnement consiste à diviser un journal de sujet en plusieurs journaux pouvant être hébergés sur des nœuds distincts du cluster Kafka. Cela permet d'éviter que des journaux de sujet soient trop volumineux pour être hébergés sur un seul nœud.

Les partitions de réplication peuvent être copiées entre plusieurs brokers afin de garantir leur sécurité en cas de panne de l'un d'eux. Ces copies sont appelées réplicas.

7.Service d'ensemble : un ensemble est un service centralisé permettant de gérer les informations de configuration, de découvrir les données et de fournir une synchronisation et une coordination distribuées. Kafka s'appuyait auparavant sur Apache ZooKeeper pour cela, mais les versions récentes ont migré vers un autre service de consensus appelé KRaft.

Tous les logiciels de streaming d'événements ne nécessitent pas l'installation d'un service Ensemble distinct. Redpanda, qui offre un streaming de données 100 % compatible avec Kafka, est prêt à l'emploi car il intègre déjà cette fonctionnalité.

Apache Kafka Deep Dive: Core Concepts, Data Engineering Applications, and Real-World Production Practices

Aimé Bangirahe — Tue, 23 Sep 2025 15:59:01 +0000

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Use-case

Kafka is helpful in various real-life operational and data analytics use cases.

•Messaging: This domain has its own specialized software, like RabbitMQ and ActiveMQ, but Kafka is often sufficient to handle it while providing great performance.

•Website activity tracking: Kafka can handle small, frequently generated data records like page views, user actions, and other web-based browsing activity.

•Metrics: You can easily consolidate and aggregate data that can be sorted using topics.

•Log aggregation: Kafka makes it possible to gather logs from different sources and aggregate them in one place in a single format.

•Stream processing: Streaming pipelines are one of the most important Kafka features, making it possible to process and transform data in transit.

•Event-driven architecture: Applications can publish and react to events asynchronously, allowing events in one part of your system to easily trigger behavior somewhere else. For example, a customer purchasing an item in your store can trigger inventory updates, shipping notices, etc.

Architectural components

These are the most essential high-level components of Kafka:

Record
Producer
Consumer
Broker
Topic
Partitioning
Replication
ZooKeeper or Controller Quorum

1.Record
Also called an event or message, a record is a byte array that can store any object of any format. An example would be a JSON record describing what link a user clicked while they were on your website.
Sometimes you want to distribute certain kinds of events among a group of consumers, so each event will be distributed to just one of the consumers in that group. Kafka allows you to define consumer groups this way.
A critical design approach is that, besides consumer groups, no other interconnection happens among clients. Producers and consumers are fully decoupled and agnostic of each other.

2.Producer
A producer is a client application that publishes records (writes) to Kafka. An example here is a JavaScript snippet on a website that tracks browsing behavior on the site and sends it to the Kafka cluster.
Consumer
A consumer is a client application that subscribes to records from Kafka (i.e. reads them), such as an application that receives browsing data and loads it into a data platform for analysis.

3.Broker
A broker is a server that handles producer and consumer requests from clients and keeps the data replicated within the cluster. In other words, a broker is one of the physical machines Kafka runs on.

4.Topic
A topic is a category that allows you to organize messages. Producers send to a topic, while consumers subscribe to topics of relevance, so they only see the records they actually care about.

5.Partitioning
Partitioning means breaking a topic log into multiple logs that can live on separate nodes on the Kafka cluster. This allows you to have topic logs that are too big to live on one single node.

6.Replication
Partitions can be copied among several brokers to stay safe in case one broker experiences a failure. These copies are called replicas.

7.Ensemble service
An ensemble is a centralized service for maintaining configuration information, discovery, and providing distributed synchronization and coordination. Kafka used to rely on Apache ZooKeeper for this, although newer versions have moved to a different consensus service called KRaft.

Not all event streaming software requires installing a separate ensemble service. Redpanda, which offers 100% Kafka-compatible data streaming, works out of the box because it already has this functionality built-in.

Installing PostgreSQL for Linux

Aimé Bangirahe — Sun, 03 Aug 2025 12:02:42 +0000

This task installs PostgreSQL for Linux Servers

Procedure :

sudo su

Download the PostgreSQL source:

wget https://ftp.postgresql.org/pub/source/v9.5.13/postgresql-9.5.13.tar.gz

Install PostgreSQL using the following commands:

tar -zxvf postgresql-9.5.13.tar.gz
cd postgresql-9.5.13/
yum -y install readline-devel
./configure --prefix=/usr/local/postgresql
make
make install

Create the postgres user and change the owner of the postgres directory :

useradd postgres
chown -R postgres:postgres /usr/local/postgresql/

su postgres

Configure the system path for postgres:

vi ~/.bashrc
PGHOME=/usr/local/postgresql
export PGHOME
PGDATA=/usr/local/postgresql/data
export PGDATA
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$PGHOME/bin
export PATH

Reload the configuration using the source command:

source ~/.bashrc

Initialize the PostgreSQL database:

initdb

Configure the database. Open postgresql.conf in vi:

vi /usr/local/postgresql/data/postgresql.conf

Replace:

#listen_address='localhost' 
#port = 5432

about :

listen_address='*' 
port = 5432

Open the pg_hba.conf file in vi:

vi /usr/local/postgresql/data/pg_hba.conf

Add the following line to the file:

host all all 0.0.0.0/0 trust

Restart postgresql:

pg_ctl -D /usr/local/postgresql/data -l logfile restart

Change the password for the postgres user in the PostgreSQL database: If the postgresql service is not started, run the following commands:

Add /usr/local/pgsql/bin/

to the file: Run the following command:

psql
ALTER USER postgres WITH PASSWORD 'mot_de_passe';
\q

su postgres
vi ~/.bashrc

export PATH=/usr/local/cuda-8.0/bin:$PATH:/usr/local/pgsql/bin/

source ~/.bashrc

Create the database schema in PostgreSQL. Run the following command on the psql console:

create database edge with owner postgres encoding='UTF-8' lc_collate='en_US.utf8' lc_ctype='en_US.utf8' template template0;

In the database, create the following tables:

CREATE TABLE vi_titulaire_inspectionresult(id text, info jsonb); 
CREATE TABLE vi_titulaire_notification(id text, info jsonb); 
CREATE TABLE vi_titulaire_defectsummary(id text, info jsonb); 
CREATE TABLE vi_titulaire_uploaddataset(id text, info jsonb); 
CREATE TABLE vi_titulaire_syncprocess(id text, info jsonb); 
CREATE TABLE vi_titulaire_model(id text, info jsonb); 
CREATE TABLE vi_titulaire_datagroup(id text, info jsonb);

Installation de PostgreSQL pour Linux

Aimé Bangirahe — Sun, 03 Aug 2025 11:50:48 +0000

Cette tâche permet d'installer PostgreSQL pour Linux Servers

Procédure :

Connectez-vous en tant qu'utilisateur root :

sudo su

éléchargez le source de PostgreSQL :

wget https://ftp.postgresql.org/pub/source/v9.5.13/postgresql-9.5.13.tar.gz

Installez PostgreSQL à l'aide des commandes suivantes :

tar -zxvf postgresql-9.5.13.tar.gz
cd postgresql-9.5.13/yum -y install readline-devel./configure --prefix=/usr/local/postgresql
make
make install

Créez l'utilisateur postgres et modifiez le propriétaire du répertoire postgres :

useradd postgres
chown -R postgres:postgres /usr/local/postgresql/

Connectez-vous en tant qu'utilisateur postgres :

su postgres

Configurez le chemin du système pour postgres :

vi ~/.bashrc
PGHOME=/usr/local/postgresql
export PGHOME
PGDATA=/usr/local/postgresql/data
export PGDATA
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$PGHOME/bin
export PATH

Rechargez la configuration à l'aide de la commande source :

source ~/.bashrc

Initialisez la base de données PostgreSQL :

initdb

Configurez la base de données. Ouvrez postgresql.conf dans vi :

vi /usr/local/postgresql/data/postgresql.conf

Remplacez :

#listen_address='localhost' 
#port = 5432

par :

listen_address='*' 
port = 5432

Ouvrez le fichier pg_hba.conf dans vi :

vi /usr/local/postgresql/data/pg_hba.conf
Ajoutez la ligne suivante dans le fichier :
host all all 0.0.0.0/0 trust
- Redémarrez postgresql :

pg_ctl -D /usr/local/postgresql/data -l logfile restart

Modifiez le mot de passe de l'utilisateur postgres dans la base de données PostgreSQL :

psql
ALTER USER postgres WITH PASSWORD 'mot_de_passe';
\q

Si le service postgresql n'est pas démarré, exécutez les commandes suivantes :

su postgres
vi ~/.bashrc

Ajoutez

/usr/local/pgsql/bin/

au fichier :

export PATH=/usr/local/cuda-8.0/bin:$PATH:/usr/local/pgsql/bin/

Exécutez la commande suivante :

source ~/.bashrc

Créez le schéma de base de données dans PostgreSQL. Exécutez la commande suivante sur la console psql :

create database edge with owner postgres encoding='UTF-8' lc_collate='en_US.utf8' lc_ctype='en_US.utf8' template template0;

Dans la base de données, créez les tables suivantes :

CREATE TABLE vi_titulaire_inspectionresult(id text, info jsonb); 
CREATE TABLE vi_titulaire_notification(id text, info jsonb); 
CREATE TABLE vi_titulaire_defectsummary(id text, info jsonb); 
CREATE TABLE vi_titulaire_uploaddataset(id text, info jsonb); 
CREATE TABLE vi_titulaire_syncprocess(id text, info jsonb); 
CREATE TABLE vi_titulaire_model(id text, info jsonb); 
CREATE TABLE vi_titulaire_datagroup(id text, info jsonb);