<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tung Thanh</title>
    <description>The latest articles on Forem by Tung Thanh (@cstungthanh).</description>
    <link>https://forem.com/cstungthanh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1108037%2Fcb77f00d-d4cf-4579-a53f-1674c42c8503.png</url>
      <title>Forem: Tung Thanh</title>
      <link>https://forem.com/cstungthanh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/cstungthanh"/>
    <language>en</language>
    <item>
      <title>Basic System design overview</title>
      <dc:creator>Tung Thanh</dc:creator>
      <pubDate>Sat, 08 Jul 2023 07:39:24 +0000</pubDate>
      <link>https://forem.com/cstungthanh/basic-system-design-overview-4id</link>
      <guid>https://forem.com/cstungthanh/basic-system-design-overview-4id</guid>
      <description>&lt;p&gt;Every one already know that system, so what happen if we want to adopt this system to serve million requests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fYhT8-r5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kztzmly5vjclchi2sszh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fYhT8-r5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kztzmly5vjclchi2sszh.png" alt="Basic system" width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Keywords: Rate Limit, DNS, Proxy, Reverse proxy, Load Balancing, CDN, Blob storage, Vertical Scaling, horizontal scaling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What should I do when a user spam requests?
&lt;/h2&gt;

&lt;p&gt;At that time, we need to have a rate limiting. We can restrict the number of requests that a client can make to a server within a period of time.&lt;br&gt;
AWS provides several services that can be used for rate limiting, depending on the specific needs of your application: API Gateway, CloudFront, AWS WAF (firewall), AWS Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  where do static files(html, css) store?
&lt;/h2&gt;

&lt;p&gt;If we store it in database so each client's request will need to go to database that make the server cannot handle.&lt;br&gt;
That's why we need to have another server such as BLOG storage (S3, Azure Blog,...) and then use a CDN service (Cloudfare, AWS Cloudfront, GCP Cloud) to deliver static files for load reduction on the main server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--D_UFZnNQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tvm5ua5janl53eui9k5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--D_UFZnNQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tvm5ua5janl53eui9k5i.png" alt="What is CDN" width="590" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How do we scale the main server?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vertical scaling - scale up
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Increase the power of server: add more rams, upgrade CPU,...&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Horizontal scaling - scale out
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Increase the capacity of a system by adding more machines (nodes)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;When scaling out, how does the client know which servers they should communicate with?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;strong&gt;Load balancer&lt;/strong&gt; is responsible for distributing incoming requests from clients to the available servers in a way that ensure optimal resource utilization and performance.&lt;br&gt;
Or for more simpler, we just need to have &lt;strong&gt;a reversed proxy&lt;/strong&gt; to handle.&lt;br&gt;
(Actually reversed proxy can be used as a LB).&lt;br&gt;
LB and reverse proxy are 2 distinct technologies that can be used together and to improve performance and scalability of web app. They have several different purposes and different capacities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n60vUaAX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l3xtx1ek6xpcrfn3o0ru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n60vUaAX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l3xtx1ek6xpcrfn3o0ru.png" alt="LB" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that, we have a system like below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--quUc2C3K--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jcwvqgize55ja3gsdsdy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--quUc2C3K--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jcwvqgize55ja3gsdsdy.png" alt="System" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottlenecks at Database
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;When the number of requests from servers is high, the system may still experience bottlenecks at the database levels.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  1. Data not change frequently
&lt;/h3&gt;

&lt;p&gt;--&amp;gt; We need to have a cache database&lt;br&gt;
Instead of getting data from database, server will read from cache first&lt;br&gt;
-&amp;gt; improve read speed because cache data is stored in memory.&lt;br&gt;
There're 2 types of caching:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;In-Memory Cache:&lt;/strong&gt; 

&lt;ul&gt;
&lt;li&gt;Stores in RAM of a single server or node in a network.&lt;/li&gt;
&lt;li&gt;Problem is when RAM capacity has been exceeded so we need to use cache eviction algorithms: LRU, FIFO, LFU.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distribution Cache&lt;/strong&gt; 

&lt;ul&gt;
&lt;li&gt;Cache is share across multiple servers or nodes in networks.&lt;/li&gt;
&lt;li&gt;But if the cache server is shutdown, we would loss all cache data, so to make the system &lt;strong&gt;high availability&lt;/strong&gt; (HA), the cache should be replicated to multiple nodes (like master-slave strategy,...).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We even combine 2 kind of caching and in multiple levels: &lt;br&gt;
Flow: get Cache from In-memory -&amp;gt; cache miss -&amp;gt; Get from Distribution - &amp;gt; Miss -&amp;gt; Get from DB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--btFyXTv3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j7xcfcriq3m4ydsw8pnd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--btFyXTv3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j7xcfcriq3m4ydsw8pnd.png" alt="System with cache" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>designsystem</category>
    </item>
    <item>
      <title>Indexing and Key in Database</title>
      <dc:creator>Tung Thanh</dc:creator>
      <pubDate>Sun, 25 Jun 2023 15:26:53 +0000</pubDate>
      <link>https://forem.com/cstungthanh/clustered-vs-non-clustered-index-3247</link>
      <guid>https://forem.com/cstungthanh/clustered-vs-non-clustered-index-3247</guid>
      <description>&lt;h2&gt;
  
  
  Note
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Indexing&lt;/strong&gt; → care about the distinct value of the column

&lt;ul&gt;
&lt;li&gt;more duplicated value → low performance&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Index affects to &lt;strong&gt;IS_NULL&lt;/strong&gt; operator

&lt;ul&gt;
&lt;li&gt;when this column is not indexed → needs a table full scan to find null values.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why we need indexes for Database tables
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Speed up searching.&lt;/li&gt;
&lt;li&gt;Indexing helps in faster sorting and grouping of records.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Drawbacks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Additional disk space&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The clustered index&lt;/strong&gt; doesn’t take any extra space as it stores the physical order of the table records in the DB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-Clustered Index&lt;/strong&gt; needs extra disk space.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slower data modification&lt;/strong&gt; 

&lt;ul&gt;
&lt;li&gt;update record in the &lt;strong&gt;clustered index&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The index is nothing but a data structure that &lt;strong&gt;store the values for a specific column&lt;/strong&gt; in a table (an index is created on a column table).&lt;/li&gt;
&lt;li&gt;Improve the speed of data retrieval operations.&lt;/li&gt;
&lt;li&gt;With DML operations, indices are updated, so write operations are quite costly with indexes. 

&lt;ul&gt;
&lt;li&gt;The more indices you have, the greater the cost. &lt;/li&gt;
&lt;li&gt;Indexes are used to make READ operations faster.&lt;/li&gt;
&lt;li&gt;So if you have a system that is &lt;strong&gt;written-heavy&lt;/strong&gt; but not &lt;strong&gt;read-heavy&lt;/strong&gt;, think hard about whether you need an index or not.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cardinality&lt;/strong&gt; is IMPORTANT

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cardinality&lt;/strong&gt; means the &lt;strong&gt;number of distinct values in a column&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If you create an index in a column that has low cardinality, that’s not going to be beneficial since the index should reduce search space. Low cardinality does not significantly reduce search space.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Clustered and Non-Clustered index
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;A clustered index is a table where the data for the rows are stored&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Each table has only one clustered-index&lt;/strong&gt; - that stores row data&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When we define PK → InnoDB use it as the clustered index&lt;/li&gt;
&lt;li&gt;If we don’t define a PK → It will use the first &lt;strong&gt;UNIQUE&lt;/strong&gt; index in this table&lt;/li&gt;
&lt;li&gt;If a table has no PK or suitable UNIQUE index → It will generate a hidden clustered index: GEN_CLUST_INDEX.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each record in a secondary index contains the PK columns for the row as well as the columns specified for the secondary index.&lt;br&gt;
All InnoDB indexes are B-trees where the index records are stored in the leaf pages of the tree.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Default size of an index page is 16KB &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_page_size"&gt;MySQL::InnoDB Page Size&lt;/a&gt;
The MEMORY storage engine (known as HEAP) supports both HASH and BTREE index
→ creates special purpose tables with contents that are stored in memory.
In this engine, there&lt;/li&gt;
&lt;li&gt;HASH for equality operator (only available on MEMORY engine)&lt;/li&gt;
&lt;li&gt;BTREE for range operator (both in MEMORY and InnoDB)&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
  </channel>
</rss>
