<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rishabh Chandel</title>
    <description>The latest articles on Forem by Rishabh Chandel (@chandel).</description>
    <link>https://forem.com/chandel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1197522%2F30981772-a436-4618-8e21-a7ecb3c60c54.png</url>
      <title>Forem: Rishabh Chandel</title>
      <link>https://forem.com/chandel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/chandel"/>
    <language>en</language>
    <item>
      <title>How Git Works Internally?</title>
      <dc:creator>Rishabh Chandel</dc:creator>
      <pubDate>Mon, 30 Oct 2023 15:36:45 +0000</pubDate>
      <link>https://forem.com/chandel/how-git-works-internally-27mo</link>
      <guid>https://forem.com/chandel/how-git-works-internally-27mo</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gLlfEU6i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/k3xuaziku6w33bx2xumq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gLlfEU6i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/k3xuaziku6w33bx2xumq.png" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All of us use git on a daily basis. But how many of us know what goes on under the hood? In this blog post, we will take a deep dive into the inner workings of Git.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does Git store data?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Git is a content-addressable file system, which means its a key-value store where you can insert any type of content and get back a unique key that you can later use to retrieve that content.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Key&lt;/th&gt;
&lt;th&gt;Object&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;accefceba62b4874a613a2336de33ee716e99931&lt;/td&gt;
&lt;td&gt;&lt;code&gt;console.log(Hello World!);&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;deb909034d4053be6bdc041e8280e6d0f6eecfe&lt;/td&gt;
&lt;td&gt;&lt;code&gt;console.log(Foo Bar);&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key is generated from the content of the object, using the SHA-1 hash function. So, two files having the same content will have the same key🧐(more on this later). Also, git compresses the content of the object using the zlib library to save disk space.&lt;/p&gt;

&lt;p&gt;If you have used git before, then you have probably seen the directory called &lt;code&gt;.git&lt;/code&gt;.This is where git stores almost everything. The object database is stored in the &lt;code&gt;.git/objects&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NIMm7zEP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675640645/33a894ec-8a6b-4992-b1ce-0561aeb104f3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NIMm7zEP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675640645/33a894ec-8a6b-4992-b1ce-0561aeb104f3.png" alt="" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Git Objectsblob, tree, and commit&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In Git, the contents of files are stored in objects called &lt;strong&gt;blobs&lt;/strong&gt; , binary large objects. These blobs are different from regular files because they don't keep any extra information like when they were created or their names. Each blob is identified by a unique SHA-1 hash. SHA-1 hashes consist of 20 bytes, often represented using 40 characters in hexadecimal form(In this blog, we may occasionally display only the initial characters of this hash).&lt;/p&gt;

&lt;p&gt;Lets add a file to git and see, what a blob object and its hash look like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SPCmN-cs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675702769/0558e5e8-2583-4595-bbe5-4d37882d3e29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SPCmN-cs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675702769/0558e5e8-2583-4595-bbe5-4d37882d3e29.png" alt="" width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here a new object is created with a key &lt;code&gt;accefceba62b4874a613a2336de33ee716e99931&lt;/code&gt;(Note Git utilizes the initial two characters of the SHA-1 hash as a directory name and the remaining characters are used as the filename for the file that actually contains the blob. Git does this to reduce the number of files per directory).&lt;/p&gt;

&lt;p&gt;We can check the type &amp;amp; content of the object by following commands:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SWMzCrpo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675744080/08e6e178-8bfd-4f33-b287-b24bde878c41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SWMzCrpo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675744080/08e6e178-8bfd-4f33-b287-b24bde878c41.png" alt="" width="800" height="96"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let's create another file and add it to git.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5dB2qyQ8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675858336/7813ff53-b116-48b1-a387-af57e336c090.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5dB2qyQ8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698675858336/7813ff53-b116-48b1-a387-af57e336c090.png" alt="" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see from the &lt;code&gt;main2.js&lt;/code&gt; a new blob object is created with a different key since the content of the file is different. What if the content of 2 or more files has the same content?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--N-RuVj2h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676318618/da8eab5d-01f9-466a-8605-287bf24035b9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--N-RuVj2h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676318618/da8eab5d-01f9-466a-8605-287bf24035b9.png" alt="" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Git does not create duplicate objects 😉.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;git&lt;/code&gt;, the equivalent of a directory is a &lt;strong&gt;tree&lt;/strong&gt;. A tree is basically a directory listing, referring to blobs as well as other trees. Trees are identified by their SHA-1 hashes as well.&lt;/p&gt;

&lt;p&gt;Lets take a snapshot of that file systemand store all the files that existed at that time, along with their contents.&lt;/p&gt;

&lt;p&gt;In Git, a snapshot is represented as a &lt;strong&gt;commit&lt;/strong&gt; , containing a reference to the main tree (the root directory) and additional metadata, including the committer, commit message, and commit timestamp. Typically, commits also have one or more parent commits, representing previous snapshots. These commit objects are also identified by SHA-1 hashes, which you often encounter when using &lt;code&gt;git log&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Lets restructure our code a bit and make the first commit. Move the files &lt;code&gt;main2.js&lt;/code&gt; &lt;code&gt;main3.js&lt;/code&gt; to directory &lt;code&gt;src&lt;/code&gt; and commit. Our directory and git objects will look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qEf9Sxr_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676512074/c93257d0-4579-4d7d-abf3-b9535b837764.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qEf9Sxr_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676512074/c93257d0-4579-4d7d-abf3-b9535b837764.png" alt="" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Time to traverse this git graph of objects. We will start from the commit. Get the commit hash using the command &lt;code&gt;git log&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ktqDvQdm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676809015/93180f2f-57a1-41ef-af71-a7a44acfc3b5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ktqDvQdm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676809015/93180f2f-57a1-41ef-af71-a7a44acfc3b5.png" alt="" width="800" height="131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HdA-snke--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676839229/4f4cde26-d508-4721-801e-61120ebe440b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HdA-snke--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676839229/4f4cde26-d508-4721-801e-61120ebe440b.png" alt="" width="800" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we print a commit object in Git, it reveals essential commit details, including the author's information, the commit message, and the commit timestamp. Additionally, it provides a hash pointing to a tree object. If we delve into the tree object by printing it, we find the hash for the &lt;code&gt;main.js&lt;/code&gt; blob and another tree object representing the &lt;code&gt;src&lt;/code&gt; directory. Continuing this process, printing the src tree object uncovers the hashes for the remaining two blobs: &lt;code&gt;main2.js&lt;/code&gt; and &lt;code&gt;main3.js&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the fundamental way Git organizes and stores its data in objects. To simplify the concept, refer to the image below for a visual representation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tOPoqvPt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676954580/0fdca759-8228-4098-bab5-b6cb913dc230.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tOPoqvPt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698676954580/0fdca759-8228-4098-bab5-b6cb913dc230.png" alt="" width="381" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Time for a second commit?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;What happens when we change the content of a file, and make a new commit? Before reading on further, pause for a while and think how git would store it.&lt;/p&gt;

&lt;p&gt;Let's say we add an &lt;code&gt;!&lt;/code&gt; at the end of the line in &lt;code&gt;main.js&lt;/code&gt;, that is &lt;code&gt;console.log("Hello World");&lt;/code&gt; is changed to &lt;code&gt;console.log("Hello World!");&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Well, this change would mean a new blob will be created with a new SHA-1 hash.&lt;/p&gt;

&lt;p&gt;Since we have a new hash, the tree no longer points to accef. This means the trees content is changed and so will its hash.&lt;/p&gt;

&lt;p&gt;Almost ready to create a new commit object, and it seems like we are going to store a lot of data - the entire file system, once more! But that is not the case. Actually, most objects, specifically blob objects, havent changed since the previous commit.&lt;/p&gt;

&lt;p&gt;So this is the trick - as long as an object doesnt change, we dont store it again. We only refer to them by their hash values. We can then create our commit object.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9_ko2dbZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698677083327/908a1011-a17b-42b9-9bda-90afafb8c366.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9_ko2dbZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698677083327/908a1011-a17b-42b9-9bda-90afafb8c366.png" alt="" width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since this commit is not the first commit, it has a parent commit &lt;code&gt;bcc69&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--frjiBUeY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698677108473/f9e226c8-537e-4f9c-a0f7-1ffe0e568cd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--frjiBUeY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1698677108473/f9e226c8-537e-4f9c-a0f7-1ffe0e568cd3.png" alt="" width="701" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Summary&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In this blog post, we dived into the inner workings of Git, exploring its fundamental components: &lt;strong&gt;blobs&lt;/strong&gt; , &lt;strong&gt;trees&lt;/strong&gt; , and &lt;strong&gt;commits&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We learned that a &lt;strong&gt;blob&lt;/strong&gt; holds the contents of a file. A &lt;strong&gt;tree&lt;/strong&gt; is a directory-listing, containing &lt;strong&gt;blobs&lt;/strong&gt; and/or sub- &lt;strong&gt;trees&lt;/strong&gt;. A &lt;strong&gt;commit&lt;/strong&gt; is a snapshot of our working directory, with some meta-data such as the time or the commit message. We refrained from delving into more concepts, such as branches and tags, to keep the post concise.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>git</category>
      <category>github</category>
    </item>
  </channel>
</rss>
