Forem: SajidSamsad

Designing a Reverse Proxy: Why Golang performs better than Java(Spring Boot), An In-Depth Analysis

SajidSamsad — Tue, 09 Feb 2021 12:18:09 +0000

Reverse Proxy

In micro-service, Reverse Proxy is a popular term. Let's describe a Reverse Proxy with the following picture:

Say, you have multiple services that can have different types of functionality. To consume those functionalities, your website or mobile application needs to know the domain path (the API endpoints) of those services. But rather than giving the client this responsibility, you can abstract this logic.

So, you can have a separate service and your client can connect to this server. Now based on your request, this server will redirect the requests to the corresponding services. Therefore, this separate service is called the Reverse Proxy.

Example

Say you have three separate services and they are:

mycart.mycoolapp.com
mypayment.mycoolapp.com
mycoolproducts.mycoolapp.com

And your client consumes these three features. So, you can create a Reverse Proxy. Your client can connect to this reverse proxy, say, mycool-reverse-proxy.com.

Now, say, your client wants to get the list of all the cool products. So, it requests the proxy mycool-reverse-proxy.com/products. And now the proxy can see that in the API path you are requesting for product service. So it redirects this request to the Product service.

The whole point of Reverse Proxy is to abstract your backend logic. So the client does not need to know how many services you have or their addresses or where they are.

Cool, isn't it? Yep!

Now, look at the above image again. What is Reverse Proxy doing here? It is sitting between your clients and your services and routing the requests.

You may think that reverse proxy is a load balancer. But it is not. They have completely different types of responsibilities and they mean different things.

Now, let's do quick math!

Say, the clients will call Service A 10 requests, Service B 20 times, Service C 5 times. But as the Reverse Proxy is sitting between the clients and the services, all the requests will go through it, won't they?

Yes of course. All the 35 requests will go through it. Now as your business grows, you get more load. So you scale your services. But as we saw, the reverse proxy experience a total load of all the services. So, you'll do horizontal/vertical scaling. You add more machines.

So, while developing the reverse proxy, you'll be choosing a language, say Java, Node, Golang, etc. Does this language have any effect on the reverse proxy, or is there any performance variation among those languages in the reverse proxy? Let's find out.

In this article I will only focus on analyzing the performance of a reverse proxy in Spring Boot (which is a Java-based framework) vs Golang.

What is a thread

A thread is a place where information is executed in a serial order. Threads are executed in the processor. If you have one core and you have many threads, those threads will get executed one by one. Say, you have 3 threads A, B, C. Then in the single-core, A will get executed then B then C. But they will not get executed in parallel. But while executing the threads they will not get executed sequentially. So it will not happen that A will be executed fully then B will be executed fully then C will be executed fully.

What will happen some instructions of A will get executed then the OS will pause executing thread A and will start executing thread B. After executing some instructions of B the OS may resume executing thread A or start executing thread C. Which thread will be executed is controlled by a scheduler managed by the OS.

This is called context switching. But while a thread is paused and another thread starts executing, how can we come back to the first thread and start executing where we left?

Well, here comes the two things that made it possible.

An instruction pointer that knows which line was last executed
A Stack that contains local variables as well as pointers to heap-allocated variables.

With this information, the scheduler can easily pause a thread and run other threads and starts executing the thread from where it was paused.

Java thread

Java is a JVM based language. And when you create a thread in Java, JVM maps this thread to the OS level thread. Your Java thread and OS thread is basically 1:1 mapping. The OS-level thread has a fixed-size of the stack. It cannot grow infinitely. In a 64 bit machine the size is 1 MB. So if your memory is 1 GB you can have approximately 1000 threads. And as JVM maps the thread to the OS-level thread in a 1:1 mapping, so you can create approximately 1000 threads with a JVM-based language say in Java. So a Java thread is heavy, has a fixed stack size and you can have around 1K thread in 1 GB of memory.

What is a Goroutine

A goroutine is a lightweight thread managed by the Go runtime. It's a bit different than the Java thread we saw earlier. The Goroutine scheduler schedules goroutine for execution. But the fun fact is Goroutines are not 1:1 mapped with the OS-level thread. Multiple Goroutines are mapped into one single OS thread. So it's multiplexed to the OS thread. And another interesting fact about the Goroutine is Go has no fixed-sized stack. Rather it can grow or shrink based on the data. So Goroutine utilizes this feature. On average the default size of a stack of a new Goroutine is around 2 KB and it can grow or shrink as needed. So, in a 1 GB of memory, you can have 1024 * 1024 / 2 = 5,24,288 this many numbers if Goroutines.

Context Switching comparison for Java and Golang

As described above, JVM uses OS-level threads. So when threads execute the OS-level scheduler schedule, execute the thread. So the context switching happens at the OS-level. When the kernel makes the context switching it has to do a good amount of work. As a result, it takes some time (microseconds level) when the context switch happens.

On the other hand, Golang has its own Goroutine scheduler specially and optimally built for this task. The Goroutine scheduler map Goroutine to an OS-level thread and Process in the core. And it's optimal for context switching to takes less time to do so.

Performance comparison of Golang and Java(Spring Boot) in a Reverse Proxy

So, after discussing all of the theoretical things it comes to this point. As Spring boot is run in a Tomcat server, each request is handled by a separate thread. So, when a request comes a thread is allocated to process this request. And as it's using JVM underneath we saw that these threads are OS-level threads.

On the other hand, if you use Goroutine to serve each request, i.e. when a request comes you allocate a Goroutine to handle the request it will perform better than Spring boot. Because we already saw that Goroutine has certain advantages over the thread in the previous paragraph.

And we also saw, that in a 1GB RAM, there will be more Goroutine than thread (5,24,288 vs 1000). So you can handle more requests with a Golang service.

As a reverse proxy usually will experience all the loads of your system, so there will always be a high amount of requests there. And if you handle it with lightweight Goroutine, you can use all the pros of Goroutines to gain a high throughput and better performance and concurrently serve more requests.

Cons of Goroutines

With great power comes great responsibilities.

Though there are a good amount of positive sides in Goroutines there are some cons. In Spring boot there will be a fixed amount of thread pool. So, when a request comes, a thread is taken from the pool, and when the work is finished the thread is kept in the pool again. It's handled by the Tomcat server.

Whereas, it's not handled automatically in Golang world. So, either you designed the transport layer with the famous Golang's net/http package or use some framework like Gin-gonic there is no Goroutine pool by default. So, you'll have to manually handle it.

But you may be wondering why do I need to have a pool? Here's why?

When your code is deployed, there'll always be an OS be it in a server or in Kubernetes Pods. And an OS has a term called ulimit. ulimit is a file descriptor and an indicator to access the i/o resource. When we make a network request from our code the outside world, a TCP connection is opened and then after the handshaking, the request is made. And ulimit denotes how many file descriptors can be in the OS which is responsible for making TCP connections. The more ulimit you have the more TCP connection you can create.

A Linux OS has the value of ulimit around 2^16 = 65,536. In the Mac system, the default value is 252. But you can always increase it ulimit -n number_of_ulimit_you_want.

And this is one of the failure points in Goroutine.

What happens in a Reverse proxy? We saw that a request comes to the proxy and then based on the request, the reverse proxy redirects the request to any downstream service. And to do that, the reverse proxy makes an outbound request from itself. And to make an outbound request a TCP connection is required which is basically a network I/O. And file descriptors handle it. And ulimit denotes how many file descriptors the OS can have.

You can spin up almost 5,24,288 number of Goroutine in a 1GB RAM. Now in a reverse proxy, if you implement it with Golang and you do not have any Goroutine pool, what will happen is, you can receive a huge amount of requests and your server will not bog down. But as the reverse proxy will redirect the request to all the different downstream, so it will make all the outbound requests. As a result, so many TCP connections will open, and all of them are network I/O. So the file descriptors will handle all of them. So if the number of outbound requests to your other service exceeds the number of ulimit you have, then you'll get too many open files error.

So, that's why you should have a Goroutine pool based on the ulimit you have so that you don't fall into the above error.

Practical comparison

So... I did an experiment by having a product service written in Spring boot. Then I have developed two reverse proxies. One is written in Spring boot and the other is written in Golang (I used Gin-gonic for the router). Then I used JMeter to load test the whole system. Here is the result.

Number of Requests	Standard Deviation (Spring Boot - Golang)	Throughput req/sec (Spring Boot - Golang)	Avg (Spring Boot - Golang)	Min (Spring Boot - Golang)	Max (Spring Boot - Golang)	Received KB/sec (Spring Boot - Golang)	Avg Bytes (Spring Boot - Golang)	Notes
10000	4.14 --- 2.70	100.77801 --- 101.02337	2 --- 0	1 --- 0	208 --- 150	95.14 --- 88.99	966.7 --- 902.0	Ramp up period 100s
20000	7.82 --- 2.79	100.36483 --- 100.47373	3 --- 0	1 --- 0	342 --- 125	94.75 --- 88.50	966.7 --- 902.0	Ramp up period 100s
30000	5.85 --- 2.34	299.98200 --- 301.09599	3 --- 0	1 --- 0	275 --- 119	283.20 --- 265.22	966.7 --- 902.0	Ramp up period 100s
30000	14.20 --- 3.19	150.07654 --- 150.38348	5 --- 0	1 --- 0	495 --- 125	141.68 --- 88.50	966.7 --- 902.0	Ramp up period 200s
30000	5.00 --- 2.89	100.26671 --- 100.31834	2 --- 0	0 --- 0	245 --- 121	94.66 --- 88.37	966.7 --- 902.0	Ramp up period 300s
80000	11.63 --- 3.49	199.98850 --- 200.20772	4 --- 0	1 --- 0	497 --- 168	188.8 --- 176.35	966.7 --- 902.0	Ramp up period 100s

The definition of this matrix is as below:
Number of Requests is the number of samples with the same label.
Average is the average time of a set of results.
Min is the shortest time for the samples with the same label.
Max is the longest time for the samples with the same label
Throughput is measured in requests per second/minute/hour. The time unit is chosen so that the displayed rate is at least 1.0. When the throughput is saved to a CSV file, it is expressed in requests/second, i.e. 30.0 requests/minute is saved as 0.5.
Received KB/sec is throughput measured in Kilobytes per second. Time is in milliseconds.
Standard Deviation is - a measure of the variability of a data set. JMeter calculates population standard deviation (STDEVP function analog)
Avg Bytes - arithmetic means of response bytes for the samples with the same label.

As you can see from the above matrix Golang performs better than the Spring Boot. The Standard Deviation is denoting that the Spring Boot system is deviating a lot.

Let's look at the CPU and memory usage of the Golang proxy and Spring Boot proxy.

In my hexacore machine, I ran the product service written in Spring boot. Then first I launched the spring boot proxy and then took the matrix. After that, I ran the Go proxy and took the matrix. They are as follows:

Proxy type	% CPU	Number of OS threads	Memory (MB)
Spring Boot proxy	7.2	46	124.2
Go Proxy	0.1	26	7.8

So... Golang really used less CPU and Memory compared to the Spring boot proxy. And based on the previous theoretical discussion this is expected.

So, it can be said that when a proxy will be deployed in the server, in terms of resources like CPU and memory, Golang performs better than the Spring Boot. So in the case of scaling, like horizontal scaling, it will take more Spring boot proxy to cater huge amount of load. Whereas it will take less Go proxy because the auto-scaling group usually monitors the CPU and memory usages of the instances when it comes to the scaling.

So, that's it for today. Hope you enjoyed it.

See you in another article. Till then enjoy life.

Reference

3.https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit#heading=h.mmq8lm48qfcw

Fantastic Microservice and the Sorcerer's Circuit Breaker

SajidSamsad — Thu, 14 Jan 2021 07:31:02 +0000

In this article I'll try to explain what is a circuit breaker pattern.

Fail-fast system

a fail-fast system is one that immediately reports at its interface any condition that is likely to indicate a failure. - Wikipedia

So if a part/parts of a system fails, a fail-fast system will know this and it will stop operating in the normal way it usually does. If something fails, a fail-fast system can behave in a way that is defined for some failure, like a fallback when something goes wrong. A fail-fast system often checks a system's state at various intervals. So failures can be detected early.

A wise saying in system design is

If a system fails, then let it fail fast and fail in a safe state.

Fault-tolerant system

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of some of its components. - Wikipedia

If you are at this point can you relate a fail-fast system with a fault-tolerant system?

Well, a fault-tolerant system will continue operating properly rather than crashing or stopping serving requests if some components do not work or fail. So it means, a fault-tolerant system can detect a failure and when that occurs it knows what to do which is like the fail-fast system.

So, a fault-tolerant system has the property of a fail-fast system that monitors the system's state, and when some failures occur it can detect the failures. And, when the system detects the failures, it does not stop operating. Rather it behaves that is specifically designed for the failure events like fallback behavior in the event of failure.

And this is what a circuit-breaker is.

Circuit Breaker

A circuit is a path through which electricity moves. A circuit breaker is a component in the circuit that monitors the amount of electricity.

You probably saw this in your house.

This is a circuit breaker. If the electricity in the circuit is more than the threshold amount it breaks the circuit and makes the circuit open. So, no electricity can pass the circuit and the extra electricity cannot harm the electrical components.

Circuit breaker pattern

The circuit breaker is a very popular pattern for making a system resilient and fault-tolerant. In the microservice world, one service can depend on many services. And as the microservices are deployed by separate teams, any service can experience downtime for any reason. Either maintenance is going on or the server/pods crashed etc. This downtime can cause a ripple effect on other services. In this article, I will describe the circuit breaker pattern and how to make a service resilient by introducing it.

Say, there is two microservice, Service A --- Service B. And Service B is down. Then Service A will experience that all the calls it is making to Service B are failing. It can be a connection timeout, or a request timeout, or anything. If Service B is down for 20 minutes, Service A will experience it for 20 minutes or so.

What Service A could do if it is intelligent? Well, it could monitor the requests it is sending to Service B. So, it could think,

Hey... I have sent 10 requests to Service B and all of them failed. I think Service B is down. So, you know what, I will not send any request to Service B rather I will have a fallback response and I will send it to the client who is requesting me.

Well, as you can see the circuit breaker is monitoring the states of the system. For our example, it is monitoring the success status of the requests, and based on this it is deciding that some components are not working. So it can detect the failure like a fail-fast system. But it does not stop operating. It still serves all the requests that will come but rather than sending the normal-time response it will send a fallback response. This is what makes the whole system a fault-tolerant system.

Deep Dive

The circuit in the circuit breaker has three phases.

Closed
Open
Half Open

Closed Circuit

When two services are UP and communicating, the requests are always allowed. It means the circuit is closed. So the path is established. Like the following:

Service A ------ Service B

Open Circuit

When the service B is down the circuit is made open. No requests are passed to Service B. Like the following:

Service A XXXXXXXXX Service B
In this case, all the requests are handled by the fallback response from Service A.

Half Open

In this phase, some small amount of requests are passed to Service B. Usually, when you're circuit is in Open State you don't want to wait in this state forever. Rather you should wait for some time and then check again if the downstream service is UP. But when you are checking this, you also don't want to allow all of your requests to the downstream service. Rather, you should send a small number of requests to check if the downstream service is UP. And when you are doing this, this state is called Half Open state.

Let's see a diagram to get a more clear idea.

Circuit diagram

When State is Changed

The state of the CircuitBreaker changes from CLOSED to OPEN when the failure rate (or percentage of slow calls) is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls have failed.

The failure rate and slow call rate can be calculated only if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.

After a wait time duration has elapsed, the CircuitBreaker state changes from OPEN to HALF OPEN and permits a configurable number of calls to see if the downstream service is still unavailable or has become available again.
- If the failure rate or slow call rate is equal or greater than the configured threshold, the state changes back to OPEN.
- If the failure rate and slow call rate are below the threshold, the state changes back to CLOSED.
When Circuit is in OPEN state all calls are rejected by throwing an exception.

And this is the concept of the Circuit Breaker.

Now, you can of course implement this. But there are some really cool circuit breaker libraries that take care of this for us.

Netflix has its own circuit breaker library called Hystrix. The description of the library says,

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services, and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Netflix uses this in its eco-system to ensure fault tolerance which is really necessary when you have hundreds or even more microservices.

But the problem with Hystrix is it's no longer in active development and is currently in maintenance mode.

An alternate of Hystrix is resilience4j. This is another fantastic library. The description says,

Resilience4j is a fault tolerance library designed for Java8 and functional programming

It has really cool features and the documentation is rich with examples and how you can configure it.

So, I hope this article helps you and I hope you enjoyed it. If you have any questions or any confusion, please feel free to comment and we can have a nice discussion.

See you in another article. Till then, let's enjoy life :D

Performance analysis of gRPC and Protobuf

SajidSamsad — Sun, 25 Oct 2020 08:19:02 +0000

So... I've come to know about gRPC and Protobuf lately and was curious about them. I googled and went on reading blogs and articles and watching talks on Youtube about what people think about gRPC and Protobuf and the impacts on them in the micro-service world. I found out these two things are amusing.

We basically use the REST pattern for micro-services to talk to each other which uses http/1.0. gRPC, on the other hand, uses http/2.0. It brings some nice features like compressing headers, of course, the payload. And with gRPC, a remote call can be placed to a separate micro-service. With the help of Protobuf and gRPC, it's easier to talk to different micro-services, and during this, the payload size decreases because it's compressed. As a result, it takes less bandwidth of the servers. Which without any doubt is a great improvement.

In REST, it's basically a request-response model. gRPC, on the contrary, takes full advantage of http/2.0 and gives streaming features like client streaming, server streaming, bi-directional streaming (both client-server streaming).

And there are many cool features in gRPC and to test it really performs better than JSON based REST, I did a POC. In this Github repo, you can find the stats of the POC. And I first hand observed that compared to JSON based REST, gRPC with Protobuf performs better.

To conduct the POC, I have two services, namely order service, and payment service. In v1 the order service and payment service talks between them with gRPC and Protobuf. And I tried to load test on them and tried to figure out the average request/sec and the average read size. The result was great. 1000K requests in 329.12s, 318 MB read.

Then I created v2 of the system and this time I made the services talk to each other with JSON-based REST and load test them. This time 976796 2xx responses, 7465 non-2xx responses, 984k requests in 2449.86s, 312 MB read, 16k timeouts.

I was amused to see that gRPC based system manage to handle all the 1000K requests without any time out. Though it may seem like the "read" is almost similar in terms of MB in both versions. Yes, it's basically because my payload was not that much large, a bit small to be honest. But when the payload size increases, the difference can be observed and in those cases, gRPC services save lots of bandwidth in the servers/pods.

The Github repository can be found here:
gRPC + Protobuf POC

Let me know what you think about this and if you have observed the same or interested in talking stuff about gRPC and Protobuf.

ORM for Cassandra

SajidSamsad — Wed, 30 Sep 2020 15:39:33 +0000

So... I was wondering what is the best ORM for Django and Spring Boot to interact to Cassandra?

What I want is I don't want to write raw query, and the model and serialization is enriched.

One extra question.
Is it good to use Cassandra with TypeScript? Is there any good support?

Removing duplicate code from the codebase

SajidSamsad — Wed, 23 Sep 2020 07:43:21 +0000

So... when there're features development going on, we developed the feature and published it. But at some point, new features arrive and it may so happen that we can't support it with the existing desing so we introduce design patterns i.e. factory, strategy, etc and incorporate those changes.

Say, for example, a factory pattern is introduced and currently there're two different types of classes A and B.
And the factory method is responsible to generate an instance of A or B based on some logic.

And later, say for new requirements, there's a new class C. And A, B, C has 4 functions to do. A, B, C has been implemented based on an interface abstracting the core logic.

Now two functions of C is similar to B but two functions of C is different than B. Now as interface has been used here, then we have duplicate code in our codebase.

The two functions of B and C who does the same thing is duplicated in the codebase.

In such scenarios, what are the best practices to -

Implement these in such a way so that diplicay does not arise in the first place?
If no 1 is not possible then how to remove duplicate code from the codebase?