<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Djouldé BARRY</title>
    <description>The latest articles on Forem by Djouldé BARRY (@djouldbarry_ee2).</description>
    <link>https://forem.com/djouldbarry_ee2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2947305%2Fa414a93e-21d1-4aba-b8b2-69938e3addab.png</url>
      <title>Forem: Djouldé BARRY</title>
      <link>https://forem.com/djouldbarry_ee2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/djouldbarry_ee2"/>
    <language>en</language>
    <item>
      <title>🍏 Eat 5 Fruits and Vegetables a Day… and What About Our Data? 🤔</title>
      <dc:creator>Djouldé BARRY</dc:creator>
      <pubDate>Tue, 18 Mar 2025 19:14:27 +0000</pubDate>
      <link>https://forem.com/djouldbarry_ee2/eat-5-fruits-and-vegetables-a-day-and-what-about-our-data-53k9</link>
      <guid>https://forem.com/djouldbarry_ee2/eat-5-fruits-and-vegetables-a-day-and-what-about-our-data-53k9</guid>
      <description>&lt;p&gt;We always hear the health advice: &lt;strong&gt;"Eat 5 fruits and vegetables a day!"&lt;/strong&gt; 🍎🍌🥦&lt;br&gt;&lt;br&gt;
It’s good for our health, keeps us fit, and gives us energy.&lt;br&gt;&lt;br&gt;
But what if we applied this logic to &lt;strong&gt;data management&lt;/strong&gt;?  &lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Welcome to the world of Data Lakes, Data Warehouses, and Data Lakehouses!&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Because, just like with food, making the right choices in data is key.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Data Lake: A Raw Fruit Market
&lt;/h2&gt;

&lt;p&gt;Imagine a market full of &lt;strong&gt;fresh fruits&lt;/strong&gt;: oranges, apples, grapes, lemons…&lt;br&gt;&lt;br&gt;
That’s exactly what a &lt;strong&gt;Data Lake&lt;/strong&gt; is: a place where &lt;strong&gt;all raw data is stored&lt;/strong&gt; without processing.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjy8662wqq2i5r15qs0j.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjy8662wqq2i5r15qs0j.jpeg" alt="Comparison of Data Lake, Data Warehouse, and Data Lakehouse with fruit and juice metaphor" width="800" height="787"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
✅ You store everything! (just like when you bring home tons of fruit from the market).&lt;br&gt;&lt;br&gt;
✅ Flexible: you can process data later however you want.&lt;br&gt;&lt;br&gt;
✅ Ideal for Big Data and advanced analytics.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
❌ Too much unorganized data can become messy (like a fridge full of food, but "nothing to eat" 😅).&lt;br&gt;&lt;br&gt;
❌ Requires experts to extract real value.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Amazon S3 is a popular storage solution for Data Lakes.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🏬 Data Warehouse: Ready-to-Drink Juice
&lt;/h2&gt;

&lt;p&gt;Once you’ve picked the fruits, what do you do? You &lt;strong&gt;process&lt;/strong&gt; them into &lt;strong&gt;organized juice bottles&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
That’s exactly what a &lt;strong&gt;Data Warehouse&lt;/strong&gt; does: it stores data in a &lt;strong&gt;structured, optimized way for analysis&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
✅ Data is clean and ready to use (like a fresh bottle of juice).&lt;br&gt;&lt;br&gt;
✅ High-performance and optimized for analytics.&lt;br&gt;&lt;br&gt;
✅ Clearly structured and efficient.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
❌ Less flexibility (you can’t turn juice back into a fruit 🍊➡️🧃).&lt;br&gt;&lt;br&gt;
❌ Can be expensive and rigid.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Snowflake and Google BigQuery are popular Data Warehouses.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🏡 Data Lakehouse: The Best of Both Worlds
&lt;/h2&gt;

&lt;p&gt;What if you had &lt;strong&gt;both fresh fruits&lt;/strong&gt; AND &lt;strong&gt;ready-made juice&lt;/strong&gt;?&lt;br&gt;&lt;br&gt;
That’s what a &lt;strong&gt;Data Lakehouse&lt;/strong&gt; offers: a combination of a Data Lake’s flexibility and a Data Warehouse’s structured efficiency.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
✅ Flexibility and performance in one place.&lt;br&gt;&lt;br&gt;
✅ More cost-effective and scalable.&lt;br&gt;&lt;br&gt;
✅ A single environment for both raw and processed data.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
❌ Can be more complex to implement.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Databricks provides a powerful Lakehouse architecture.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 &lt;strong&gt;Moral of the Story: Which "Juice" Should You Choose?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data management is like a healthy diet: &lt;strong&gt;balance is key&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
👉 Need flexibility? &lt;strong&gt;Go for a Data Lake&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
👉 Need speed and structured analysis? &lt;strong&gt;Choose a Data Warehouse&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
👉 Want both? &lt;strong&gt;A Data Lakehouse is the answer&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;So, what’s your &lt;strong&gt;data strategy&lt;/strong&gt;? Are you more of a &lt;strong&gt;"fresh juice"&lt;/strong&gt; or &lt;strong&gt;"fruit market"&lt;/strong&gt; type? 🚀  &lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Now It’s Your Turn!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;💬 &lt;strong&gt;Share your experience with Data Lakes, Warehouses, and Lakehouses in the comments!&lt;/strong&gt;  &lt;/p&gt;

</description>
      <category>bigdata</category>
      <category>datalake</category>
      <category>datascience</category>
      <category>databricks</category>
    </item>
    <item>
      <title>Pandas Mindmap: A Visual Guide to DataFrame Manipulation</title>
      <dc:creator>Djouldé BARRY</dc:creator>
      <pubDate>Mon, 17 Mar 2025 15:19:00 +0000</pubDate>
      <link>https://forem.com/djouldbarry_ee2/pandas-mindmap-a-visual-guide-to-dataframe-manipulation-la8</link>
      <guid>https://forem.com/djouldbarry_ee2/pandas-mindmap-a-visual-guide-to-dataframe-manipulation-la8</guid>
      <description>&lt;p&gt;Pandas is an essential library for data manipulation and analysis in Python.&lt;br&gt;&lt;br&gt;
This &lt;em&gt;mindmap&lt;/em&gt; provides a structured visual approach to quickly grasp Pandas' core functionalities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Pandas Mindmap?
&lt;/h2&gt;

&lt;p&gt;With so many available methods and transformations, it can be overwhelming to memorize everything.&lt;br&gt;&lt;br&gt;
A visual mindmap helps to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easily recall essential commands.
&lt;/li&gt;
&lt;li&gt;Navigate the API without repeatedly checking documentation.
&lt;/li&gt;
&lt;li&gt;Improve efficiency when working with DataFrames.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Pandas Mindmap: The Ultimate Cheat Sheet!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy8bjdjkb0kc45sg3at6i.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy8bjdjkb0kc45sg3at6i.jpg" alt="Pandas Mindmap" width="800" height="1130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mindmap covers the fundamental operations on &lt;code&gt;DataFrames&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Creating &amp;amp; Importing Data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploring &amp;amp; Manipulating&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cleaning &amp;amp; Transforming&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merging &amp;amp; Aggregating&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exporting Results&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Essential Pandas Commands&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here are some key operations for handling data efficiently:&lt;/p&gt;

&lt;h3&gt;
  
  
  Load a CSV file
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Preview first rows
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Filtering data
&lt;/span&gt;&lt;span class="n"&gt;df_filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;column&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Quick statistics
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Merge two DataFrames
&lt;/span&gt;&lt;span class="n"&gt;df_merge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;left&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Source
&lt;/h2&gt;

&lt;p&gt;This article is inspired by the original &lt;strong&gt;Pandas Mindmap&lt;/strong&gt; created by &lt;a href="https://yaoyaoustc.github.io/2019/11/20/pandas-mindmap/" rel="noopener noreferrer"&gt;Yao Yao&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
All credits for the original visualization go to the author.&lt;/p&gt;

</description>
      <category>python</category>
      <category>pandas</category>
      <category>dataengineering</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
