Forem: Hitesh Pattanayak

gRPC dynamic loadbalancing

Hitesh Pattanayak — Wed, 24 May 2023 16:34:26 +0000

gRPC

gRPC has many benefits, like:

Multiplexes many requests using same connection.
Support for typical client-server request-response as well as duplex streaming.
Usage of a fast, very light, binary protocol with structured data as the communication medium among services.

More about gRPC

All above make gRPC a very attractive deal but there is some consideration with gRPC particularly load balancing.

The issue

Lets delve deep into the issue.

For this we will require a setup. The setup includes below:

a gRPC server, we call it Greet Server.
a client that acts as a REST gateway and internally it is a gRPC client as well. We call it Greet Client.

We are also using kubernetes for the demonstration, hence there are a bunch of YAML manifest files. Let me explain them below:

greetserver-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greetserver-deploy
spec:
  replicas: 3
  selector:
    matchLabels:
      run: greetserver
  template:
    metadata:
      labels:
        run: greetserver
    spec:
      containers:
        - image: hiteshpattanayak/greet-server:1.0
          imagePullPolicy: IfNotPresent
          name: greetserver
          ports:
            - containerPort: 50051
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name

The above is a deployment mainfest of Greet Server, that spins up 3 replicas of Greet Server.
The Greet Server uses hiteshpattanayak/greet-server:1.0 image.
Also each pod of the deployment exposes 50051 port.
Environment variables: POD_IP and POD_NAME are injected into the pods.

What does each pod in the above server do?

They expose an rpc or service that expects a first_name and a last_name, in response they return a message in this format:
reponse from Greet rpc: Hello, <first_name> <last_name> from pod: name(<pod_name>), ip(<pod_ip>).

From the response, we can deduce which pod did our request land in.

greet.svc.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    run: greetserver
  name: greetserver
  namespace: default
spec:
  ports:
    - name: grpc
      port: 50051
      protocol: TCP
      targetPort: 50051
  selector:
    run: greetserver

The above is a service manifest of Greet server service. This basically acts as a proxy to above Greet Server pods.

The selector section of the service matches with the labels section of each pod.

greetclient-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greetclient-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      run: greetclient
  template:
    metadata:
      labels:
        run: greetclient
    spec:
      containers:
        - image: hiteshpattanayak/greet-client:4.0
          name: greetclient
          ports:
            - containerPort: 9091
          env:
            - name: GRPC_SERVER_HOST
              value: greetserver.default.svc.cluster.local
            - name: GRPC_SVC
              value: greetserver
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace

The above is a deployment mainfest of Greet Client, that spins up 1 replica of Greet Client.

As mentioned above the pod runs an applications that acts as a rest gateway and reaches out to Greet Server in order to process the request.

This deployment is using hiteshpattanayak/greet-client:4.0 image.

The 4.0 tagged image has the load balancing issue.

Also the pod(s) expose port 9091.

greetclient-svc.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    run: greetclient
  name: greetclient
  namespace: default
spec:
  ports:
    - name: restgateway
      port: 9091
      protocol: TCP
      targetPort: 9091
  selector:
    run: greetclient
  type: LoadBalancer

The above service is just to redirect traffic to the Greet Client pods.

greet-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: greet-ingress
  namespace: default
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
    - host: greet.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: greetclient
                port:
                  name: restgateway

The above ingress is to expose Greet Client Service to outside of the cluster.

Note:
minikube by default does not have ingress enabled by default

check enabled or not: minikube addons list
enable ingress addon: minikube addons enable ingress

greet-clusterrole.yaml

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: service-reader
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "watch", "list"]

greet-clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: service-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: service-reader
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default

The cluster role and cluster role binding are required because the default service account does not have permission to fetch service details.
And the greet client pod internally tries to fetch service details, hence the binding is required.

Create the setup in below sequence:


kubectl create -f greet-clusterrole.yaml

kubectl create -f greet-clusterrolebinding.yaml

kubectl create -f greetserver-deploy.yaml

kubectl get po -l 'run=greetserver' -o wide
<<com
NAME                                  READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
greetserver-deploy-7595ccbdd5-67bmd   1/1     Running   0          91s   172.17.0.4   minikube   <none>           <none>
greetserver-deploy-7595ccbdd5-k6zbl   1/1     Running   0          91s   172.17.0.3   minikube   <none>           <none>
greetserver-deploy-7595ccbdd5-l8kmv   1/1     Running   0          91s   172.17.0.2   minikube   <none>           <none>
com

kubectl create -f greet.svc.yaml
kubectl get svc
<<com
NAME          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
greetserver   ClusterIP   None         <none>        50051/TCP   77s
com

kubectl create -f greetclient-deploy.yaml
kubectl get po -l 'run=greetclient' -o wide
<<com
NAME                                 READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
greetclient-deploy-6bddb94df-jwr25   1/1     Running   0          35s   172.17.0.6   minikube   <none>           <none>
com

kubectl create -f greet-client.svc.yaml
kubectl get svc
<<com
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
greetclient   LoadBalancer   10.110.255.115   <pending>     9091:32713/TCP   22s
greetserver   ClusterIP      None             <none>        50051/TCP        5m14s
com

kubectl create -f greet-ingress.yaml
kubectl get ingress
<<com
NAME            CLASS   HOSTS       ADDRESS        PORTS   AGE
greet-ingress   nginx   greet.com   192.168.49.2   80      32s
com

since we have exposed the Greet Client to outside of cluster via greet-ingress, the endpoint can be accessed on: http://greet.com/greet.
so when we make a curl request:

Request#1

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com

Request#2

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com

Request#3

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com

So the ISSUE is no matter hw many request I make, the request lands up in the same server. This is happending because of sticky nature of HTTP/2.
The advantage of gRPC becomes it own peril.

The codebase to replicate the issue can be found here.

gRPC Client side load balancing

We have discussed earlier about one of the challenges with gRPC which is load balancing.

That happens due to the sticky nature of gRPC connections.

Now we shall discuss how to resolve the issue.

This particular solution is quite simple.

The onus to load balance falls on the client itself.

To be particular, client does not mean end user. All gRPC servers have a REST gateway that is used by end users.

This is because HTTP2, which is the protocol used by gRPC, is yet to have browser support.

Hence the REST gateway acts as a gRPC client to gRPC servers. And thats why gRPC is mostly used for internal communications.

Earlier we had used hiteshpattanayak/greet-client:4.0 image for Greet Client which had the normal gRPC setup without client side load balancing.
The code can be referred here.

Changes

Code changes

For this solution we use hiteshpattanayak/greet-client:11.0 image. The codebase has below changes:

Updated client deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greetclient-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      run: greetclient
  template:
    metadata:
      labels:
        run: greetclient
    spec:
      containers:
        - image: hiteshpattanayak/greet-client:11.0
          name: greetclient
          ports:
            - containerPort: 9091
          env:
            - name: GRPC_SVC
              value: greetserver
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace

configuring load balancing policy while making dialing to the server.
configuring to terminate connection while dialing to the server.

a.conn, err = grpc.Dial(
        servAddr,
        grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
        grpc.WithBlock(),
        opts,
    )

the server address used while dialing needs to the dns address of the server.

var serverHost string
if host := kubernetes.GetServiceDnsName(client, os.Getenv("GRPC_SVC"), os.Getenv("POD_NAMESPACE")); len(host) > 0 {
        serverHost = host
    }

servAddr := fmt.Sprintf("%s:%s", serverHost, serverPort)

Headless service

also earlier while replicating the issue the service (greetserver) we created for Greet server pods was of normal ClusterIP type. The headless ClusterIP service is required for this solution.

apiVersion: v1
kind: Service
metadata:
  labels:
    run: greetserver
  name: greetserver
  namespace: default
spec:
  ports:
    - name: grpc
      port: 50051
      protocol: TCP
      targetPort: 50051
  selector:
    run: greetserver
  clusterIP: None

One significant thing to notice over here is that this is a special type of ClusterIP service called Headless service.

In this service kind, the type of service is not specified. By default the type becomes ClusterIP. Which means the service becomes available within cluster.

You can set .spec.clusterIP, if you already have an existing DNS entry that you wish to reuse.

In case you set .spec.clusterIP to None, it makes the service headless, which means when a client sends a request to a headless Service, it will get back a list of all Pods that this Service represents (in this case, the ones with the label run: greetserver).

Kubernetes allows clients to discover pod IPs through DNS lookups. Usually, when you perform a DNS lookup for a service, the DNS server returns a single IP — the service’s cluster IP. But if you tell Kubernetes you don’t need a cluster IP for your service (you do this by setting the clusterIP field to None in the service specification ), the DNS server will return the pod IPs instead of the single service IP. Instead of returning a single DNS A record, the DNS server will return multiple A records for the service, each pointing to the IP of an individual pod backing the service at that moment. Clients can therefore do a simple DNS A record lookup and get the IPs of all the pods that are part of the service. The client can then use that information to connect to one, many, or all of them.

Basically, the Service now lets the client decide on how it wants to connect to the Pods.

Verify headless service DNS lookup

create the headless service:

kubectl create -f greet.svc.yaml

create an utility pod:

kubectl run dnsutils --image=tutum/dnsutils --command -- sleep infinity

verify by running nslookup command on the pod

kubectl exec dnsutils --  nslookup greetserver

<<com
Result

Server:         10.96.0.10
Address:        10.96.0.10#53
Name:   greetserver.default.svc.cluster.local
Address: 172.17.0.4
Name:   greetserver.default.svc.cluster.local
Address: 172.17.0.3
Name:   greetserver.default.svc.cluster.local
Address: 172.17.0.2

As you can see headless service resolves into the IP address of all pods connected through service.

Contrast this with the output returned for non-headless service.

kubectl exec dnsutils --  nslookup greetclient

<<com
Server:     10.96.0.10
Address:    10.96.0.10#53

Name:   greetclient.default.svc.cluster.local
Address: 10.110.255.115
com

Now lets test the changes by making curl requests to the exposed ingress.

Request#1

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-k6zbl), ip(172.17.0.3).
com

Request#2

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-67bmd), ip(172.17.0.4).
com

Request#3

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com

The issue no longer exists.

But what we are losing here is the capability of gRPC to retain connections for a longer period of time and multiplex several requests through them thereby reducing latency.

gRPC lookaside load balancing

Earlier we discussed about:

Load balancing challenge with gRPC
How to address the above challenge via client side load balancing

Even though we were able to resolve the load balancing issue but we traded off one of the major advantage of gRPC which is long duration connections.

So in this post we would like the achive load balancing (still client side) but we are gonna not trade off the above mentionde gRPC's advantage.

I would like to re-iterate when I say onus to load balance falls on client side, client does not mean end user. All gRPC servers have a REST gateway that is used by end users. gRPC services are not directly exposed because of lack of browser support.

Lookaside load balancer

The purpose of this load balancer is to resolve which gRPC server to connect.

At the moment this load balancer works in two ways: round robin and random.

Load balancer itself is gRPC based and since the load is not going to be too much only one pod would suffice.

It exposes a service called lookaside and an rpc called Resolve which expects the type of routing along with some details about the gRPC servers like kubernetes service name and namespace they exist in.

Using the service name and namespace, it is going to fetch kubernetes endpoints object associated with it. From the endpoint object server IPs can be found.
Those IPs are going to be stored in memory. Every now and then those IPs would be refreshed based on interval set. For every request to resolve IP, it is going to rotate the IPs based on the routing type in the request.

Code for lookaside load balancer can be found here.

We are using the image hiteshpattanayak/lookaside:9.0 for lookaside pod.

The pod manifest would be like this:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: lookaside
  name: lookaside
  namespace: default
spec:
  containers:
    - image: hiteshpattanayak/lookaside:9.0
      name: lookaside
      ports:
        - containerPort: 50055
      env:
        - name: LB_PORT
          value: "50055"

since it is too a gRPC server, the exposed port is 50055.

The service manifest that exposes the pod is as below:

apiVersion: v1
kind: Service
metadata:
  labels:
    run: lookaside-svc
  name: lookaside-svc
  namespace: default
spec:
  ports:
    - port: 50055
      protocol: TCP
      targetPort: 50055
  selector:
    run: lookaside
  clusterIP: None

I chose headless service for this as well but there is no such need for this.

Updated the ClusterRole to include ability to fetch endpoints and pod details

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: service-reader
rules:
  - apiGroups: [""]
    resources: ["services", "pods", "endpoints"]
    verbs: ["get", "watch", "list"]

Changes with Greet Client

Greet Client is now integrated with lookaside loadbalancer.

The client is set to use RoundRobin routing type but can be made configurable via configmap or environment variables.

Removed setting load-balancing policy and forcefully terminating connection by setting WithBlock option while dialing.

from

conn, err := grpc.Dial(
    servAddr,
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
    grpc.WithBlock(),
    opts,
)

conn, err := grpc.Dial(
    servAddr,
    opts,
)

So how does it solve the earlier load balancing problem where we traded off terminating long duration connections for the sake of load balancing.

What we did was to store the previous connections to the server and reuse it but rotate for each request.

if c, ok := a.greetClients[host]; !ok {
    servAddr := fmt.Sprintf("%s:%s", host, serverPort)

    fmt.Println("dialing greet server", servAddr)

    conn, err := grpc.Dial(
        servAddr,
        opts,
    )
    if err != nil {
        log.Printf("could not connect greet server: %v", err)
        return err
    }

    a.conn[host] = conn

    a.currentGreetClient = proto.NewGreetServiceClient(conn)
    a.greetClients[host] = a.currentGreetClient
} else {
    a.currentGreetClient = c
}

Conclusion

gRPC is a great solution for microservice internal communication because of efficiency, speed and parity. But the long duration connections though an advantage results in tricky load balancing. With the help of this article we found ways to handle it.

There are ways to handle via service meshes like Linkerd and Istio. But it would be handy to have solutions incase where service meshes are not setup.

Folks, if you like my content, would you consider following me on LinkedIn.

Blue Green Deployment with Kubernetes

Hitesh Pattanayak — Sun, 02 Apr 2023 07:52:34 +0000

Blue-green deployment is a software deployment strategy that allows the release of new features or updates without any downtime or service disruption. In this approach, two identical environments are created, one being the active or live environment (blue) and the other being the idle environment (green) where the new changes are deployed and tested before switching the traffic to the new environment.

The process of blue-green deployment involves the following steps:

The initial production environment (blue) is running and actively serving customer traffic.
A new environment (green) is created with the updated codebase or application version.
The new environment is thoroughly tested to ensure that everything is working correctly and that the application is stable.
Once the new environment is deemed stable, traffic is switched from the old environment to the new environment, and the old environment is decommissioned.

The main advantages of blue-green deployment are:

Reduced downtime: Since the new environment is tested thoroughly before deploying to the live environment, there is minimal to no downtime during the actual deployment process.
Risk mitigation: In case of any issues or bugs that might have been missed during testing, the old environment can be quickly switched back to, reducing the risk of downtime or service disruption.
Fast rollback: If any problems occur after the new environment is deployed, the switch back to the old environment can be done instantly, ensuring fast rollback with minimal impact to the end-users.
Increased reliability: Blue-green deployment enables a reliable and consistent approach to application updates, ensuring that there is minimal disruption to the end-users.
Testing in production: Blue-green deployment also allows for testing in production, which is an efficient way of testing new changes while avoiding any impact on the live environment.

Blue-Green vs Canary

There is another popular deployment strategy called 'Canary Deployment'.

Although they share some similarities, there are significant differences between the two.

Canary deployment is a technique that allows you to deploy new changes to a small subset of users or servers, often referred to as the "canary group," and then gradually roll out the changes to the rest of the users or servers. This approach allows you to test the new changes on a small scale before making them available to the entire user base.

One of the main differences between the two is that in blue-green deployment, both the old and new versions of the application are running at the same time, while in canary deployment, only one version of the application is running at a time. Another difference is that blue-green deployment typically involves switching the entire traffic from one environment to another, while canary deployment gradually increases the traffic to the new version.

Both deployment strategies have their advantages and disadvantages, and the choice between the two depends on various factors such as the size of the user base, the criticality of the application, and the available resources. Blue-green deployment is preferred when there are a large number of users, and the application needs to be highly available, while canary deployment is a good option when you want to test new changes on a small scale before rolling them out to the entire user base.

Steps to demonstrate blue-green deployment

** Create Blue deployment **

with pod label app: blue
with an busybox initContainer

initContainers:
    - name: install
    image: busybox:1.28
    command:
    - /bin/sh
    - -c
    - "echo blue > /work-dir/index.html"
    volumeMounts:
    - name: workdir
        mountPath: "/work-dir"

Final blue deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: blue-1.10
spec:
  replicas: 3
  selector:
    matchLabels:
      name: nginx
      app: blue
  template:
    metadata:
      labels:
        name: nginx
        app: blue
    spec:
      initContainers:
      - name: install
        image: busybox:1.28
        command:
        - /bin/sh
        - -c
        - "echo blue > /work-dir/index.html"
        volumeMounts:
        - name: workdir
          mountPath: "/work-dir"
      containers: 
        - name: nginx
          image: nginx:1.10
          ports:
          - name: http
            containerPort: 80
          volumeMounts:
          - name: workdir
            mountPath: /usr/share/nginx/html
      volumes:
      - name: workdir
        emptyDir: {}

** Create a service to interact with the blue deployment **

apiVersion: v1
kind: Service
metadata: 
  name: nginx
  labels: 
    name: nginx
spec:
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector: 
    name: nginx
    app: blue

** Create both deployment and service **

kubectl create -f blue-deploy.yaml
kubectl create -f blue-green-svc.yaml

** Validate both the objects **

kubectl get deploy

NAME         READY   UP-TO-DATE   AVAILABLE   AGE
blue-1.10   3/3     3            3           4m37s


kubectl get svc

NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
nginx        ClusterIP   10.109.140.77    <none>        80/TCP    57s

kubectl get po

NAME                          READY   STATUS    RESTARTS   AGE
blue-1.10-579857b89c-jvthn   1/1     Running   0          5m51s
blue-1.10-579857b89c-thq2z   1/1     Running   0          5m51s
blue-1.10-579857b89c-vhrvc   1/1     Running   0          5m51s

** Test the deployment **

kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox --command -- wget -qO- nginx

# o/p
blue
pod "busybox" deleted

** Create Green deployment **

Green deployment is same as blue except for label app=green and command echoing green instead of blue.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: green-1.10
spec:
  replicas: 3
  selector:
    matchLabels:
      name: nginx
      app: green
  template:
    metadata:
      labels:
        name: nginx
        app: green
    spec:
      initContainers:
      - name: install
        image: busybox:1.28
        command:
        - /bin/sh
        - -c
        - "echo green > /work-dir/index.html"
        volumeMounts:
        - name: workdir
          mountPath: "/work-dir"
      containers: 
        - name: nginx
          image: nginx:1.10
          ports:
          - name: http
            containerPort: 80
          volumeMounts:
          - name: workdir
            mountPath: /usr/share/nginx/html
      volumes:
      - name: workdir
        emptyDir: {}

** Create green deployment **

kubectl create -f green-deploy.yaml

** Validate the green deployment **

kubectl get deployments

NAME         READY   UP-TO-DATE   AVAILABLE   AGE
blue-1.10    3/3     3            3           3m33s
green-1.10   3/3     3            3           2m29s

kubectl get po

NAME                          READY   STATUS    RESTARTS   AGE
blue-1.10-6c6ff57655-9p5pf    1/1     Running   0          3m52s
blue-1.10-6c6ff57655-knpf5    1/1     Running   0          3m52s
blue-1.10-6c6ff57655-wpgqp    1/1     Running   0          3m52s
green-1.10-7f754b7675-kfj89   1/1     Running   0          2m48s
green-1.10-7f754b7675-nd749   1/1     Running   0          2m48s
green-1.10-7f754b7675-nrd9m   1/1     Running   0          2m48s

** Update the selector of svc **

Change selector of service nginx from app: blue to app: green

apiVersion: v1
kind: Service
metadata: 
  name: nginx
  labels: 
    name: nginx
spec:
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector: 
    name: nginx
    app: green

** Re-Apply the service object ***

kubectl apply -f blue-green-svc.yaml

** Test the deployment **

kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox --command -- wget -qO- nginx

# o/p
green
pod "busybox" deleted

Simple bash script to automate blue to green switch

#!/bin/bash

# bg-deploy.sh <service-name> <blue-deployment-name> <green-deployment-yaml-path> <green-deployment-name>

SERVICE=$1
BLUEDEPLOYMENTNAME=$2
GREENDEPLOYMENTFILE=$3
GREENDEPLOYMENTNAME=$2

kubectl apply -f $GREENDEPLOYMENTFILE

# Wait until the Green Deployment is ready by checking the MinimumReplicasAvailable condition.
READY=$(kubectl get deploy $GREENDEPLOYMENTNAME -o json | jq '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status' | tr -d '"')
while [[ "$READY" != "True" ]]; do
    READY=$(kubectl get deploy $GREENDEPLOYMENTNAME -o json | jq '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status' | tr -d '"')
    sleep 5
done

# Update the service selector with the new version
kubectl patch svc $SERVICE -p "{\"spec\":{\"selector\": {\"name\": \"${SERVICE}\", \"app\": \"green\"}}}"

# Delete blue deploy [optional]
kubectl delete deploy $BLUEDEPLOYMENTNAME

echo "Done."

In conclusion, blue-green deployment is an efficient and reliable approach to software deployment, allowing for the release of new features or updates with minimal to no downtime. With its ability to test in production, fast rollback, and reduced risk of service disruption, blue-green deployment is a preferred choice for applications that need to be highly available. While canary deployment shares some similarities with blue-green deployment, it differs in its approach to gradual rollout and testing on a small scale. The choice between the two deployment strategies depends on various factors, including the size of the user base, criticality of the application, and available resources. Overall, blue-green deployment is an excellent option for large applications that require high availability and reliability.

Canary Deployment with Kubernetes

Hitesh Pattanayak — Sat, 01 Apr 2023 10:43:39 +0000

In this post, we will learn how to do canary deployment using Kubernetes. In a canary deployment, a small set of users or requests are directed to a new version of the software while the majority of the traffic is still being handled by the old version. This allows us to test new versions of software in production without risking the entire system.

Advantage of Canary Deployment

Canary deployment is a deployment strategy where a new version of an application is gradually rolled out to a small subset of users or servers before it is released to the entire user base. The advantages of canary deployment include:

Early detection of issues: Canary deployment allows you to test new features or changes on a small scale before rolling them out to your entire user base. This helps you to detect and fix any issues or bugs early on, minimizing the impact on your users.
Reduced risk: With canary deployment, you are reducing the risk of deploying new features or changes by limiting the scope of the rollout. This makes it easier to recover from any issues that may arise.
Better user experience: By gradually rolling out changes to a small subset of users, you can gather feedback and make adjustments before releasing the changes to your entire user base. This ensures a better user experience for your customers.
Improved performance: Canary deployment can improve the performance of your application by allowing you to test and optimize new features or changes on a small scale before rolling them out to your entire user base.
Increased agility: Canary deployment enables you to be more agile in your development process by allowing you to release new features or changes more frequently and with less risk. This can help you to stay ahead of the competition and meet the changing needs of your users.

Steps to demontrate Canary Deployment

Let’s start by creating two nginx deployments with labels version=v1 and version=v2 to differentiate. We will call our first deployment nginx-app-1 and use the following command:

kubectl create deploy nginx-app-1 --image=nginx --replicas=3 --dry-run=client -o yaml > deploy-1.yaml

Next, we will edit the deploy-1.yaml file and add a label app=v1 to the metadata.labels section. We will also add an initContainers section for the busybox pod:

initContainers:
  - name: install
    image: busybox:1.28
    command:
      - /bin/sh
      - -c
      - "echo version-1 > /work-dir/index.html"
    volumeMounts:
      - name: workdir
        mountPath: "/work-dir"

Now, let’s create a service for the deployment to be accessible:

apiVersion: v1
kind: Service
metadata:
  name: nginx-app-svc
  labels:
    app: nginx-app
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector:
    app: nginx-app

We can test the deployment by running:

kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox --command -- wget -qO- nginx-app-svc

Now, let’s create another similar deployment with one replica, but with a label value of app=v2. We will call this deployment nginx-app-2:

kubectl create deploy nginx-app-2 --image=nginx --replicas=1 --dry-run=client -o yaml > deploy-2.yaml

We will also edit the deploy-2.yaml file and add a label app=v2 to the metadata.labels section. We will deploy this new version of the software by running:

kubectl apply -f deploy-2.yaml

We can continuously call the service to see how the load balancer diverts the traffic between the two versions. To do this, we can run:

kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox -- /bin/sh -c 'while sleep 1; do wget -qO- nginx-app-svc; done'

Once we determine that nginx-app-2 is stable and we would like to deprecate nginx-app-1, we can delete the old deployment by running:

kubectl delete deploy nginx-app-1

All traffic will then be directed to nginx-app-2. We can also scale nginx-app-2 to four replicas by running:

kubectl scale --replicas=4 deploy nginx-app-2

To check the traffic, we can run the following command:

while sleep 0.1; do curl $(kubectl get svc nginx-app-svc -o jsonpath="{.spec.clusterIP}"); done

In summary, canary deployment is a powerful tool that can help you to deploy new features or changes with more confidence and less risk. By gradually rolling out changes and gathering feedback, you can improve the user experience, performance, and agility of your application.

Managing Kubernetes Resources with Resource Quotas

Hitesh Pattanayak — Fri, 31 Mar 2023 10:49:41 +0000

Resource quota

When several users or teams share a cluster with a fixed number of nodes, there is a concern that one team could use more than its fair share of resources.

Resource quotas are a tool for administrators to address this concern.

A resource quota, defined by a ResourceQuota object, provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that namespace.

Resource quotas work like this:

Different teams work in different namespaces. This can be enforced with RBAC.

The administrator creates one ResourceQuota for each namespace.

Users create resources (pods, services, etc.) in the namespace, and the quota system tracks usage to ensure it does not exceed hard resource limits defined in a ResourceQuota.

If creating or updating a resource violates a quota constraint, the request will fail with HTTP status code 403 FORBIDDEN with a message explaining the constraint that would have been violated.

If quota is enabled in a namespace for compute resources like cpu and memory, users must specify requests or limits for those values; otherwise, the quota system may reject pod creation. Hint: Use the LimitRanger admission controller to force defaults for pods that make no compute resource requirements.

Explanation of resource_quota.yaml file

This is a YAML file containing two Kubernetes objects.

apiVersion: v1
kind: Namespace
metadata:
  name: mynamespace

---
apiVersion: v1
kind: List
items:
  - apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: user-compute-quota
      namespace: mynamespace
    spec:
      hard:
        requests.cpu: "1"
        requests.memory: 1Gi
        limits.cpu: "2"
        limits.memory: 2Gi
  - apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: user-object-quota
      namespace: mynamespace
    spec:
      hard:
        configmaps: "10"
        persistentvolumeclaims: "4"
        replicationcontrollers: "20"
        secrets: "10"
        services: "10"
        services.loadbalancers: "2"

The first object is of kind "Namespace" and has a metadata field with the name "mynamespace". This is creating a namespace named "mynamespace" in Kubernetes.

The second object is of kind "List" and contains a list of two "ResourceQuota" objects. The first "ResourceQuota" object has a metadata field with the name "user-compute-quota" and namespace set to "mynamespace". The "spec" field defines resource usage limits for the namespace, such as CPU and memory requests and limits. Specifically, this quota allows for a maximum of 1 CPU and 1Gi of memory for requests and a maximum of 2 CPUs and 2Gi of memory for limits.

The second "ResourceQuota" object has a metadata field with the name "user-object-quota" and namespace set to "mynamespace". The "spec" field defines limits on the number of objects of different types that can be created in the namespace. Specifically, this quota allows for a maximum of 10 configmaps, 4 persistentvolumeclaims, 20 replicationcontrollers, 10 secrets, 10 services, and 2 load balancer services.

Steps to create ‘Resource Quota’ objects and observe their working

Create resource quota objects:

kubectl apply -f resource_quota.yaml

Verify created resource quota objects:

 kubectl get quota -n mynamespace 

NAME                 AGE   REQUEST                                                                                                                                   LIMIT
user-compute-quota   79s   requests.cpu: 0/1, requests.memory: 0/1Gi                                                                                                 limits.cpu: 0/2, limits.memory: 0/2Gi
user-object-quota    44s   configmaps: 1/10, persistentvolumeclaims: 0/4, replicationcontrollers: 0/20, secrets: 0/10, services: 0/10, services.loadbalancers: 0/2

Create deployment (dep_without_quota.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: mynamespace
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: k8s-demo
          image: nginx
          ports:
            - name: nginx-port
              containerPort: 8080

kubectl create -f dep_without_quota.yaml

Verify deployment

kubectl get deploy -n mynamespace

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   0/3     0            0           31s

nothing in ready state.

Explore reason

kubectl get rs -n mynamespace 

NAME                          DESIRED   CURRENT   READY   AGE
nginx-deployment-64ff9dbcdf   3         0         0       2m51s


kubectl describe rs nginx-deployment-64ff9dbcdf

...
Events:
  Type     Reason        Age                 From                   Message
  ----     ------        ----                ----                   -------
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-wkr8s" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-mhn6c" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-77n7m" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-t5m7q" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-j6974" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-ldtx8" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m9s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-jzfqg" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m8s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-xrp62" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  3m8s                replicaset-controller  Error creating: pods "nginx-deployment-64ff9dbcdf-s29ht" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo
  Warning  FailedCreate  25s (x7 over 3m6s)  replicaset-controller  (combined from similar events): Error creating: pods "nginx-deployment-64ff9dbcdf-9gxvc" is forbidden: failed quota: user-compute-quota: must specify limits.cpu for: k8s-demo; limits.memory for: k8s-demo; requests.cpu for: k8s-demo; requests.memory for: k8s-demo

since quota is not specified in the deployment file, it fails to provision the pods.

Create deployment (dep_with_quota.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld-deployment
  namespace: mynamespace
  labels:
    app: nginx-ssl
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-ssl
  template:
    metadata:
      labels:
        app: nginx-ssl
    spec:
      containers:
        - name: k8s-demo
          image: nginx:1.16
          ports:
            - name: nginxssl-port
              containerPort: 8081
          resources:
            requests:
              cpu: 200m
              memory: 0.5Gi
            limits:
              cpu: 400m
              memory: 1Gi

kubectl create -f dep_with_quota.yaml

Verify deployment

kubectl get rs -n mynamespace 

NAME                               DESIRED   CURRENT   READY   AGE
helloworld-deployment-7986cbf64d   3         2         2       45s
nginx-deployment-64ff9dbcdf        3         0         0       8m52s

only 2 out of 3 are in ready state.

Explore reason

kubectl get rs -n mynamespace 

NAME                               DESIRED   CURRENT   READY   AGE
helloworld-deployment-7986cbf64d   3         2         2       45s
nginx-deployment-64ff9dbcdf        3         0         0       8m52s

kubectl describe rs helloworld-deployment-7986cbf64d -n mynamespace

Events:
  Type     Reason            Age                From                   Message
  ----     ------            ----               ----                   -------
  Normal   SuccessfulCreate  59s                replicaset-controller  Created pod: helloworld-deployment-7986cbf64d-k46hw
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-cxjqx" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Normal   SuccessfulCreate  59s                replicaset-controller  Created pod: helloworld-deployment-7986cbf64d-r68vf
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-sx2vj" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-zx6pq" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-khchm" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-z8j4s" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-7rqj4" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-sb8ml" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      59s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-pxsdv" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      58s                replicaset-controller  Error creating: pods "helloworld-deployment-7986cbf64d-jdnfp" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi
  Warning  FailedCreate      39s (x8 over 57s)  replicaset-controller  (combined from similar events): Error creating: pods "helloworld-deployment-7986cbf64d-pjms2" is forbidden: exceeded quota: user-compute-quota, requested: limits.memory=1Gi,requests.memory=512Mi, used: limits.memory=2Gi,requests.memory=1Gi, limited: limits.memory=2Gi,requests.memory=1Gi

failed to provision 3rd pod as it exceeded quota

How much quota exhausted:

kubectl describe quota user-compute-quota -n mynamespace 

Name:            user-compute-quota
Namespace:       mynamespace
Resource         Used  Hard
--------         ----  ----
limits.cpu       800m  2
limits.memory    2Gi   2Gi
requests.cpu     400m  1
requests.memory  1Gi   1Gi

And they consumed resources as specified in resources specs of the helloworld deployment.

Create a limit ranger object:

apiVersion: v1
kind: LimitRange
metadata:
  name: limits-quota
  namespace: mynamespace
spec:
  limits:
    - default:
        cpu: 200m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 256Mi
      type: Container

kubectl create -f limit_range.yaml

Now retry dep_without_quota.yaml

kubectl create -f dep_without_quota.yaml

Verify again dep_without_quota replicas/pods:

kubectl get deploy -n mynamespace 

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3/3     3            3           13s

All 3 are in ready states

Finally check the consumption of quota

kubectl describe quota user-compute-quota -n mynamespace 

Name:            user-compute-quota
Namespace:       mynamespace
Resource         Used  Hard
--------         ----  ----
limits.cpu       600m    2
limits.memory    1536Mi  2Gi
requests.cpu     300m    1
requests.memory  768Mi   1Gi

All 3 pods got scheduled using default limits specified in limit_range.yaml This is because the deployment ‘nginx-deployment’ does not have resources spec specified hence the pods created out of it use default specs from LimitRange object.

Folks, if you like my content, would you consider following me on linked in at: https://www.linkedin.com/in/hitesh-pattanayak-52290b160/

Understanding gRPC Concepts, Use Cases & Best Practices

Hitesh Pattanayak — Sun, 15 Jan 2023 17:03:40 +0000

Original blog post

As we progress with application development, among various things, there is one primary thing we are less worried about i.e. computing power. With the advent of cloud providers, we are less worried about managing data centers. Everything is available within seconds, and that too on-demand. This leads
With the increase in the size of data, we have activities like serializing, deserializing and transportation costs added to it. Though we are not worried about computing resources, the latency becomes an overhead. We need to cut down on transportation. A lot of messaging protocols have been developed in the past to address this. SOAP was bulky, and REST is a trimmed-down version, but we need an even more efficient framework. That’s where Remote Procedure Calls (RPC) comes in.

In this blog post, we will understand what RPC is and the various implementations of RPC with a focus on gRPC, which is Google's implementation of RPC. We'll also compare REST with RPC and understand various aspects of gRPC, including security, tooling, and much more. So, let's get started!

What is RPC?

RPC stands for ‘Remote Procedure Calls’. The definition is in the name itself. Procedure calls simply mean function/method calls; it's the ‘Remote’ word that makes all the difference. What if we can make a function call remotely?

Simply put, if a function resides on a ‘server’ and in order to be invoked from the ‘client’ side, could we make it as simple as a method/function call? Essentially what an RPC does is it gives the ‘illusion’ to the client that it is invoking a local method, but in reality, it invokes a method in a remote machine that abstracts the network layer tasks. The beauty of this is that the contract is kept very strict and transparent (we will discuss this later in the article).

Steps involved in an RPC call:

This is how a typical REST process looks like:

RPCs boil down the process to below:

This is because all the complications associated with making a request are now abstracted from us (we will discuss this in code-generation). All we need to worry about is the data and logic.

gRPC - what, why, and how of it

So far, we discussed RPC, which essentially means making function/method calls remotely. Thereby giving us the benefits like ‘strict contract definition’, ‘abstracting transmission and conversion of data’, ‘reducing latency’, etc. Which we will be discussing as we proceed with this post. What we would really like to dive deep into is one of the implementations of RPC. RPC is a concept, and gRPC is a framework based on it.

There are various implementations of RPCs. They are:

gRPC (Google)
Thrift (Facebook)
Finagle (Twitter)

Google’s version of RPC is referred to as gRPC which was introduced in 2015 and has been gaining traction since. It is one of the most chosen communication mechanisms in a microservice architecture.

gRPC uses protocol buffers (it is an open source message format) as the default method of communication between client and server. Also, gRPC uses HTTP/ 2 as the default protocol.

There are again four types of communication that gRPC supports:

Unary (typical client and server communication)
Client side streaming
Server side streaming
Bidirectional streaming

Coming on to the message format that is being used widely in gRPC - protocol buffers a.k.a protobufs. A protobuf message looks something like below:

message Person {
  string name = 1;
  string id = 2;
  string email = 3;
}

Here, Person is the message we would like to transfer (as a part of request/response), which has fields name (string type), id (string type) and email (string type). The numbers 1, 2, 3 represent the position of the data (as in name, id, and has_ponycopter) when it is serialized to binary format.

Once the developer has created the Protocol Buffer file(s) with all messages, we can use a ‘protocol buffer compiler’ (a binary) to compile the written protocol buffer file, which will generate all the utility classes and methods which are needed to work with the message. For example, as shown in the above Person message, depending on the chosen language, the generated code will look like this.

How do we define services?

We need to define services that use the above messages to be sent/received.

After writing the necessary request and response message types, the next step is to write the service itself.
gRPC services are also defined in Protocol Buffers and they use the ‘service’ and ‘rpc’ keywords to define a service.

Take a look at the content of the below proto file:

message HelloRequest {
  string name = 1;
  string description = 2;
  int32 id = 3;
}
message HelloResponse {
  string processedMessage = 1;
}
service HelloService {
  rpc SayHello (HelloRequest) returns (HelloResponse);
}

Here, HelloRequest and HelloResponse are the messages and HelloService is exposing one unary RPC called SayHello which takes HelloRequest as input and gives HelloResponse as output.

As mentioned, HelloService at the moment contains a single unary RPC. But it could contain more than one RPC. Also, it can contain a variety of RPCs (unary/client-side streaming/server-side streaming/Bidirectional).

In order to define a streaming RPC, all you have to do is prefix ‘stream’ before the request/response argument, Streaming RPCs proto definitions, and generated code.

In the above code-base link:

streaming.proto: this file is user defined
streaming.pb.go & streaming_grpc.pb.go: these files are auto-generated on running proto compiler command command.

gRPC vs REST

We did talk about gRPC a fair bit. Also, there was a mention of REST. What we missed was discussing the difference. I mean when we have a well-established, lightweight communication framework in the form of REST, why was there a need to look for another communication framework? Let us understand more about gRPC with respect to REST along with the pros and cons of each of it.

In order to compare what we require are parameters. So let’s break down the comparison into the below parameters:

Message format: protocol buffers vs JSON
- Serialization and deserialization speed is way better in the case of protocol buffers across all data sizes (small/medium/large). Benchmark-Test-Results.
- Post serialization JSON is human readable while protobufs (in binary format) are not. Not sure if this is a disadvantage or not because sometimes you would like to see the request details in the Google developers tool or Kafka topics and in the case of protobufs you can't make out anything.
Communication protocol: HTTP 1.1 vs HTTP/2
- REST is based on HTTP 1.1; communication between a REST client and server would require an established TCP connection which in turn has a 3-way handshake involved. When we get a response from the server upon sending a request from the client, the TCP connection does not exist after that. A new TCP connection needs to be spun up in order to process another request. This establishment of a TCP connection on each and every request adds up to the latency.
- So gRPC which is based on HTTP 2 has encountered this challenge by having a persistent connection. We must remember that persistent connections in HTTP 2 are different from that in web sockets where a TCP connection is hijacked and the data transfer is unmonitored. In a gRPC connection, once a TCP connection is established, it is reused for several requests. All requests from the same client and server pair are multiplexed onto the same TCP connection.
Just worrying about data and logic: Code generation being a first-class citizen
- Code generation features are native to gRPC via its in-built protoc compiler. With REST APIs, it’s necessary to use a third-party tool such as Swagger to auto-generate the code for API calls in various languages.
- In the case of gRPC, it abstracts the process of marshaling/unmarshalling, setting up a connection, and sending/receiving messages; what we all need to worry about is the data that we want to send or receive and the logic.
Transmission speed
- Since the binary format is much lighter than JSON format, the transmission speed in the case of gRPC is 7 to 10 times faster than that of REST.

Feature	REST	gRPC
Communication Protocol	Follows request-response model. It can work with either HTTP version but is typically used with HTTP 1.1	Follows client-response model and is based on HTTP 2. Some servers have workarounds to make it work with HTTP 1.1 (via rest gateways)
Browser support	Works everywhere	Limited support. Need to use gRPC-Web, which is an extension for the web and is based on HTTP 1.1
Payload data structure	Mostly uses JSON and XML-based payloads to transmit data	Uses protocol buffers by default to transmit payloads
Code generation	Need to use third-party tools like Swagger to generate client code	gRPC has native support for code generation for various languages
Request caching	Easy to cache requests on the client and server sides. Most clients/servers natively support it (for example via cookies)	Does not support request/response caching by default

Again for the time being gRPC does not have browser support since most of the UI frameworks still have limited or no support for gRPC. Although gRPC is an automatic choice in most cases when it comes to internal microservices communication, it is not the same for external communication that requires UI integration.

Now that we have done a comparison of both the frameworks: gRPC and REST. Which one to use and when?

In a microservice architecture with multiple lightweight microservices, where the efficiency of data transmission is paramount, gRPC would be an ideal choice.
If code generation with multiple language support is a requirement, gRPC should be the go-to framework.
With gRPC’s streaming capabilities, real-time apps like trading or OTT would benefit from it rather than polling using REST.
If bandwidth is a constraint, gRPC would provide much lower latency and throughput.
If quicker development and high-speed iteration is a requirement, REST should be a go-to option.

gRPC Concepts

Load balancing

Even though the persistent connection solves the latency issue, it props up another challenge in the form of load balancing. Since gRPC (or HTTP2) creates persistent connections, even with the presence of a load balancer, the client forms a persistent connection with the server which is behind the load balancer. This is analogous to a sticky session.

We can understand the challenge via a demo & the code and deployment files for the same are present in this repository.

From the above demo code base, we can find out that the onus of load balancing falls on the client. This leads to the fact that the advantage of gRPC i.e. persistent connection does not exist with this change. But gRPC can still be used for its other benefits.

Clean contract

In REST, the contract between the client and server is documented but not strict. If we go back even further to SOAP, contracts were exposed via wsdl files. In REST we expose contracts via Swagger and other provisions. But the strictness is lacking, we cannot for sure know if the contract has changed on the server's side while the client code is being developed.

With gRPC, the contract is shared with both the client and server either directly via proto files or generated stub from proto files. This is like making a function call but remotely. And since we are making a function call we exactly know what we need to send and what we are expecting as a response. The complexity of making connections with the client, taking care of security, serialization-deserialization, etc are abstracted. All we care about is the data.

Lets consider the code base for Greet App.

The client uses the stub (generated code from proto file) to create a client object and invoke remote function call:

import greetpb "github.com/infracloudio/grpc-blog/greet_app/internal/pkg/proto"
cc, err := grpc.Dial("<server-address>", opts)
if err != nil {
    log.Fatalf("could not connect: %v", err)
}
c := greetpb.NewGreetServiceClient(cc)
res, err := c.Greet(context.Background(), req)
if err != nil {
    log.Fatalf("error while calling greet rpc : %v", err)
}

Similarly, the server too uses the same stub (generated code from proto file) to receive request object and create response object:

import greetpb "github.com/infracloudio/grpc-blog/greet_app/internal/pkg/proto"
func (*server) Greet(_ context.Context, req *greetpb.GreetingRequest) (*greetpb.GreetingResponse, error) {

  // do something with 'req'

   return &greetpb.GreetingResponse{
    Result: result,
      }, nil
}

Both of them are using the same stub generated from the proto file greet.proto.

And the stub was generated using ‘proto’ compiler and the command to generate is this.

protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative internal/pkg/proto/*.proto

Security

gRPC authentication and authorization works on two levels:

Call-level authentication/authorization is usually handled through tokens that are applied in metadata when the call is made. Token based authentication example.
Channel-level authentication uses a client certificate that's applied at the connection level. It can also include call-level authentication/authorization credentials to be applied to every call on the channel automatically. Certificate based authentication example.

Either or both of these mechanisms can be used to help secure services.

Middlewares

In REST, we use middlewares for various purposes like:

Rate limiting
Pre/Post request/response validation
Address security threats

We can achieve the same with gRPC as well. The verbiage is different in gRPC, they are referred as ‘interceptors’ but they do similar activities.

In the middlewares branch of the greet_app code base, we have integrated logger and Prometheus interceptors.

Look how the interceptors are configured to use Prometheus and logging packages in middleware.go.

    // add middleware
    AddLogging(&zap.Logger{}, &uInterceptors, &sInterceptors)
    AddPrometheus(&uInterceptors, &sInterceptors)

But we can integrate other packages to interceptors for purposes like preventing panic and recovery (to handle exceptions), tracing, even authentication, etc.

Supported middlewares by gRPC framework.

Packaging, versioning and code practices of proto files

Packaging

Let's follow the packaging branch.

First start with Taskfile.yaml, the task gen-pkg says protoc --proto_path=packaging packaging/*.proto --go_out=packaging. This means protoc (the compiler) will convert all files in packaging/*.proto into its equivalent Go files as denoted by flag --go_out=packaging in the packaging directory itself.

Secondly in the ‘processor.proto’ file, 2 messages have been defined namely ‘CPU’ and ‘GPU’. While CPU is a simple message with 3 fields of in-built data types, GPU message on the other hand has an additional custom data type called ‘Memory’ along with in-built data types same as CPU message. ‘Memory’ is a separate message and is defined in a different file altogether.
So how do you use the ‘Memory’ message in the ‘processor.proto’ file? By using import.

syntax = "proto3";
package laptop_pkg;
option go_package = "/pb";
import "memory.proto";
message CPU {
  string brand = 1;
  string name = 2;
  uint32 cores = 3;
}
message GPU {
  string brand = 1;
  string name = 2;
  uint32 cores = 3;
  Memory memory = 4;
}

syntax = "proto3";
package laptop_pkg;
option go_package = "/pb";
message Memory {
  enum Unit {
    UNKNOWN = 0;
    BIT = 1;
    BYTE = 2;
    KILOBYTE = 3;
    MEGABYTE = 4;
    GIGABYTE = 5;
  }
  uint64 value = 1;
  Unit unit = 2;
}

Even if you try to generate a proto file by running task gen-pkg after mentioning import, it will throw an error. As by default protoc assumes both files memory.proto and processor.proto to be in different packages. So you need to mention the same package name in both files.
The optional go_package indicates the compiler to create a package name as pb for Go files. If any other language-d proto files were to be created, the package name would be laptop_pkg.

Versioning

There can be two kinds of changes in gRPC breaking and non-breaking changes:

Non-breaking changes include adding a new service, adding a new method to a service, adding a field to request or response proto, and adding a value to enum
Breaking changes like renaming a field, changing field data type, field number, renaming or removing a package, service or methods require versioning of services
In order to distinguish between same name messages or services across proto files, optional packaging can be implemented.

Code practices

Request message must suffix with request CreateUserRequest.
Response message must suffix with request CreateUserResponse.
In case the response message is empty, you can either use an empty object CreateUserResponse or use the google.protobuf.Empty.
Package name must make sense and must be versioned, for example: package com.ic.internal_api.service1.v1.

Tooling

gRPC ecosystem supports an array of tools to make life easier in non-developmental tasks like documentation, rest gateway for a gRPC server, integrating custom validators, linting, etc. Here are some tools that can help us achieve the same:

protoc-gen-grpc-gateway — plugin for creating a gRPC REST API gateway. It allows gRPC endpoints as REST API endpoints and performs the translation from JSON to proto. Basically, you define a gRPC service with some custom annotations and it makes those gRPC methods accessible via REST using JSON requests.
protoc-gen-swagger — a companion plugin for grpc-gateway. It is able to generate swagger.json based on the custom annotations required for gRPC gateway. You can then import that file into your REST client of choice (such as Postman) and perform REST API calls to the methods you exposed.
protoc-gen-grpc-web — a plugin that allows our front end to communicate with the backend using gRPC calls. A separate blog post on this coming up in the future.
protoc-gen-go-validators — a plugin that allows to define validation rules for proto message fields. It generates a Validate() error method for proto messages you can call in Go to validate if the message matches your predefined expectations.
protolint - a plugin to add lint rules to proto files.

Testing using Postman

Unlike testing REST APIs with Postman or any equivalent tools like Insomnia, it is not quite comfortable to test gRPC services.

Note: gRPC services can also be tested from CLI using tools like evans-cli. But for that reflection needs (if not enabled the path to the proto file is required) to be enabled in gRPC servers. This compare link shows the way to enable reflection and how to enter into evans-cli repl mode. Post entering repl mode of evans-cli, gRPC services can be tested from CLI itself and the process is described in evans-cli GitHub page.

Postman has a beta version of testing gRPC services.

Here are the steps of how you can do it:

Open Postman, goto ‘APIs’ in the left sidebar and click on ‘+’ sign to create new api. In the popup window, enter ‘Name’, ‘Version’, and ‘Schema Details’ and click on create [unless you need to import from some sources like GitHub, Bitbucket].

Once Your API gets created then go to definition and enter your proto contract.

Remember importing does not work here, so it would be better to keep all dependent protos at one place.
The above steps will help to retain contracts for future use.
Then click on ‘New’ and select ‘gRPC request', enter the URI and choose the proto from the list of saved ‘APIs’ and finally enter your request message and hit ‘Invoke’

In the above steps we figured out the process to test our gRPC APIs via Postman. The process to test gRPC endpoints is different from that of REST endpoints using Postman. One thing to remember is that while creating and saving proto contract as in 5, all proto message and service definitions need to be in the same place. As there is no provision to access proto messages across versions in Postman.

Conclusion

In this post, we developed an idea about RPC, drew parallels with REST as well as discussed their differences, then we went on to discuss an implementation of RPC i.e. gRPC developed by Google.

gRPC as a framework can be crucial, especially for microservice-based architecture for internal communication. It can be used for external communication as well but will require a REST gateway. gRPC is a must for streaming and real-time apps.

The way Go is proving itself as a server-side scripting language, gRPC is proving itself as a de-facto communication framework.

That's it folks! Feel free to reach out to Hitesh/Pranoy for any feedback and thoughts on this topic.

Looking for help with building your DevOps strategy or want to outsource DevOps to the experts? Learn why so many startups & enterprises consider us as one of the best DevOps consulting & services companies.

Further reads