Forem

Serverless Chats

Episode #33: The Frontlines of Serverless with Yan Cui

About Yan Cui:

Yan is an experienced engineer who has run production workload at scale in AWS for 10 years. He has been an architect and principal engineer with a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming. He has worked extensively with AWS Lambda in production, and has been helping clients around the world adopt AWS and serverless as an independent consultant. Yan is an AWS Serverless Hero and a regular speaker at user groups and conferences internationally. He is also the author of several serverless courses.


Transcript:

Jeremy:
Hi, everyone, I'm Jeremy Daly and you are listening to Serverless Chats. This week I'm chatting with Yan Cui. Hi, Yan. Thanks for joining me.

Yan: Hi, Jeremy. Thanks for having me.

Jeremy: So you are a developer advocate at Lumigo. You are an AWS serverless hero, you are also an independent consultant and I think more people know you as the Burning Monk. But why don't you tell us a little bit yourself and what you've been up to lately?

Yan: Yeah. I'm all those things you just mentioned. I'm doing some work with Lumigo as a developer advocate where I'm focusing a lot on the open-source tooling and articles and in my sole capacity as an independent consultant I also work with a lot of clients directly. A lot of them are based in London where I used to be based. Nowadays I moved to Amsterdam. I still do a lot of open-source work. I just started a new video course focusing on Lambda best practices. Then I'm also doing some workshops around the world. In Europe and also now looking at U.S as well. So doing a lot of different things to keep myself busy.

Jeremy: Awesome. Listen, I can talk to you probably about anything. Anybody who knows or seen some of the work that you've done, it's quite expansive. It's very impressive. In 2019, I have some numbers here, you did 70 blog posts, something like 2200 students to your video courses. You spoke at 31 conferences in 17 cities. But more importantly, you helped 23 clients in 11 different cities. So you are on the front line here in seeing how companies are adopting serverless. And not from one perspective and I think that's what we get a lot from different companies is, there is one perspective of how they adopt serverless and how they are working with that.

You've obviously seen this from multiple perspectives, so just, I want to talk about adoption a little bit. We'll talk about a few other things, but just what are you seeing with companies now? The customers you're working with or the clients you're working with, what are they using serverless for?

Yan: They are using it for all kinds of different things. I think depending on, I guess the maturity of the company, the domain they're working in. I've got a lot of clients that they are either enterprises or a lot of small and medium sized enterprises, and even some stealth-mode to startup as well. And obviously your constraints are completely different. That's one of the things I really enjoy about being a consultant. Where I get to see a lot different perspectives and what may work for one company may be completely inappropriate because the constraints a different company would have. So in terms of the adoption patterns, you see a lot of the, I guess startups that are in that position where they can go all in on serverless.

They are your great serverless-first going to the game. But then at the same time, you also have lots of, I guess midsize companies and enterprises. They have so much existing intellectual properties that it wouldn't make sense for them to rewrite everything just so that they can run code on Lambda. For all those companies, you see a mix of Greenfield projects. They are serverless first and then at the same time there's some effort to migrate some of the existing projects to work on serverless at least to some degree, at least gradually. Of course, depending on a lot of constraints around how much of on-premises stuff you have.

Do you have to run everything in Java in which case it is the cold start performance that's a concern. So a lot of those limitations I guess affects how quickly and how much you are able to go in on this whole serverless first mindset that we like to have and I think that is probably one of the reason that serverless adoption hasn't been as fast and as many people expected a few years ago because, the fact that, you can't just lift and shift your weight anymore. It means that you always have to allow more thought process behind it and planning and also just risk involved if you make a big mistake and it's your flagship product and of course that's going to put you in a really difficult position. But we do see that companies of all sizes and all fields and all industries are adopting serverless for lots of different workloads, not just APIs but a lot of data processing, IoT, you name it.

Jeremy: That's actually one of the things I'm curious of too. You mentioned customers in all different industries, which is really interesting. Because we get to this point now where I think every company is a software company. Everybody is building some sort of software now. But so, what are the constraints that these companies are working in?

Yan: A lot of them, I guess again, it depends on the industry you're in. For finance companies you have to be very careful about a lot of the, I guess, regulated requirements. In terms of how you handle data and also in some cases you having a plan in case you have to move away from a database for example, that's where some of your vendor lock-in arguments start to kick in. And also for example, you have enterprises who have millions lines of Java code that has been accumulated over 10 years. It's not possible for them to move everything into Lambda if they're seeing one to three seconds of cold start time on those user facing APIs. So some of those constraints are being lifted.

At least they are now getting better with new features on the platform but still it's something that people have to be aware of and also have to understand the mitigation strategies, which a lot of times is where the constraint is, is a lack of knowledge and knowhow because you can even think of Lambda as the extension to a lot of AWS offers, then it means that, you can't just know is it to visualization, you have to know a lot of different services to take full advantage of serverless.

That's where I think a lot of companies are struggling, is that they just don't have the skillset available in-house. They're exposing developers to things that they've never had to think about before. And I think that's where you get a lot benefit from serverless from having autonomous teams that can be self sufficient and look after so many different things, but at the same time, a lot of developers are just not used to working that way. They're used to working in silos where they have very few responsibility, just write your code, someone else will manage running the code in production. They'll manage the infrastructure, but now more of that is your responsibility which can be a gift, but it can be a challenge to companies that are not used to working that way as well.

Jeremy: Yeah, I totally agree. And I think that, as you mentioned learning all these other services, I think we're at a point now where most of these use cases there is some sort of serverless equivalent or serverless alternative to doing it in a more traditional way. Obviously, we're still missing certain things like, I'd love to have some sort of serverless Elasticsearch for example which would be really nice. Are there certain applications that you see people are trying with serverless or are thinking about serverless and just say, "No, I can't do it." Because the throughput needs to be higher or there is too much latency or something like that?

Yan: Yes. You see cases where in say for example, one of my clients had a very complex microservices environment whereby they have so many API to API calls and the fact that you get cold starts on one function that may not be an issue, it may not affect your 90% or whatever SOA is set. But when they start to stack up, that becomes a massive issue. So having more control around the warm up process, provisioned concurrency should help with those things. But at the same time, that is a slow process. Having to get the teams educated on what these different features are, how to work. In fact, a lot of questions I get are fairly simple questions around, how do I even do a CICD? How do I do testing? It's not clear to a lot of newcomers how do you do these things?

A lot of what we've been taught has been tethered to, there's going to be a server, I can just run everything locally and I press F5 and I can just run a local HTTP server and now everything is running in the cloud. A lot of that mindset change, it needs to happen. Those kind of paradigm shift happens gradually because well, everyone learns in a different pace and you need to have some critical mass in the industry.

Jeremy: Yeah. I like the idea of provisioned concurrency actually, because I do think it does solve a problem for the right types of applications, especially when there's low latency requirements, that it helps. I think that AWS has been pretty good about addressing those problems. They've come out now with the RDS proxy, which is helping with connections to the to relational databases. But I always feel like when that happens, they have to add another service in order for you to make it work. It's not just Lambda functions. "Hey, we've solved the connection issue with Lambda functions." It's, "We've solved the connection issue with Lambda functions because we've added a new service that now you have to use." And I think those present a number of roadblocks. And you had mentioned education as being one of those. So what are some of those other roadblocks that you see companies running into?

Yan: Well, the biggest one I feel is by far is just education. Like I said, Lambda itself is getting more and more complicated because of all the different things you can do with it. Other roadblocks includes for example some organizations are still holding onto the way they are used to operating. With centralized operation teams, cloud teams. The feature teams don't necessarily have the autonomy they need to take full advantage of all these different tools that you get and all these power and agility you get with serverless, your team can build a new feature in a week, but it's going to take them three weeks to get anything they need done, provisions and to get assets to resources they need. Then again, you're not going to get the full benefit of serverless.

So a lot of that legacy thinking at the organization is still there and is still a prominent problem and roadblock for people to take full advantage of serverless. But in terms of actual adoptions, a lot of it is ... In terms of technical roadblocks, there's some, I think the last question you had was around some use cases that just doesn't fit so well. When you've got a really high throughput system, the cost of serverless can become pretty high. So imagine you've got something that's relatively simple, but how to scale it massively like your Dropbox, not a super complex system, but have to scale to massive extent. So for them it makes perfect sense to move off of S3 and start to build their own given hardware so that they can start to optimize for that cost.

For a lot of companies, they do have that concern as well. They may not have a very complicated system that requires a hundred different functions on this massive event driven architecture, maybe they just have five end points. But those five end points are running at 10,000 or 50,000 requests per second. So in those cases, the cost for using Lambda and API gateway would be excruciating and you'd be much better off paying a team to look after your community's cluster or your containers cluster than have them running them on Lambda.

But that's always a tricky balance. Because, oftentimes you can always get the reverse argument whereby, "Well, Lambda is expensive, so I'm going to just do this myself." But then you're hiring someone for $10,000 a month

Jeremy: Exactly.

Yan: ... to look after your infrastructure, and your Lambda bill is going to be, I don't know, $100.

Jeremy: And you're hiring more than one person too. Then you're still paying for the bandwidth and some of these other things and you're still paying for compute somewhere. So that's really interesting. You made a point a little bit earlier where you said, this idea of the paradigm shift or the mind shift of going from this traditional lift and shift and bringing things into serverless. And so obviously there is a ton that needs to change. We'd like to say, it's programming and you just need to figure out the glue that works there. But you really can't just lift and shift and get those benefits, right?

Yan: Yeah. It's a common pitfall whereby the teams try to lift and shift. And initially it looks like it might work and then later on, pretty soon they found out the hard way that, that approach doesn't scale, it doesn't work nicely. And you run into all kinds of different limitations. For example, one example from a client I've worked with that can be used to illustrate that point was, they had this API which used to do lots of different things, including doing some service rendering and some API endpoint, some penalties, some requests, and then they just moved the whole thing as one big fat Lambda function.

Because one of the end points have to access some VPC resource, so of course [crosstalk 00:12:59] and now you've got, every function have to ... when cold starts have to initialize React, which is not a very lightweight dependency. Even when you've got HTTP endpoint it doesn't need it. You have to initialize it, and then also you have to wait for the VPC initialization and all of that, and they were getting performance that was so bad that it's just not acceptable for anything that's going to run in production. And of course, unless you know that the reason why that's happening is because, we've got this Fat Lambda and how the whole initial decision process works.

Then you know to split your function up into one function per endpoint perhaps. Maybe at least some separation so that the resource intensive functions are separate from the other things. I like to find that you've got tools that allow you to take your express app and just run inside Lambda. They represent the easy path for people to get some of the benefits in terms of the infrastructure automation and improve their scalability and resilience with Lambda. But at the same time, unless there's a way for you to then later on do the idiomatic way of working with Lambda or having single responsibility functions, then it becomes a bit locked into the decision that the tool has made for you and it becomes harder for you then to migrate later.

Jeremy: I actually think that's one of the better arguments for moving away from Fat Lambda or the Lambdalith. I think a lot of people have a ton of success with that. I have used them in the past as well. There's been times where it just seems to make sense, but certainly the bootstrap process. If you're bootstrapping something as part of the warmup phase of a Lambda function that isn't used by 90% of the code, it's only used for it. It's a complete waste of time and memory to boot those things up. So, I totally agree with that. I think you and I were talking in the past too where, we just said, this adoption pattern here, this is just something that is going to take time. I know you're a big fan of functional programming. I'm actually a big fan of JavaScript functional programming, which I think people think is impossible, but it is. But anyways it's something that is probably just going to take a little bit more time for people to understand working in this different way.

Yan: If you look at where we are with functional programming, it's as old as OO, but when functional programming is going to hit the mainstream guys, TBD to be decided which is to some extent is frustrating because, for many use cases functional programming is probably better tool. I'm a big fan of F sharp and done a lot of things in the past with [inaudible 00:15:46] and stuff as well. I'm a big fan. But there is a big mind shift, change the you problem solve and it's not. Mind change doesn't happen overnight, and have you have to be patient, and you have to give people time to digest and internalize this change and really understand the benefits before they become advocates themselves or at least they become practitioners.

I do think it is happening slowly. Just judging by the amount of inches that community is showing and the number of serverless conferences around the world, their interest is definitely there. But we still are a long way from having enough people who are well equipped to succeed. There are definitely a lot of people. You Michael Hart, your Ben K. But we need a lot more of them.

Jeremy: I agree One other thing on functional programming, I tell people, "Listen, once you write a pure function, you'll never go back to writing something else." Anyway, one more question on the adoption side of things. Because one of the things I see quite a bit, I really love this use case for serverless, is sort of this peripheral enhancing the DevOps sort of stuff. Do you see a lot of that where companies are using it to either do auditing or doing DevOps automation, things like that?

Yan: Yeah, tons. There actually have been quite a few companies who, their main feature teams are not using serverless, but their infrastructure and their DevOps teams are using Lambda very, very heavily. Whereby before, there's just a lot of things they couldn't do, because there's no way to tap into what's happening in your diverse ecosystem. But now with Lambda, everything that's happening in your account gets captured by CloudTrail, you can use. Or Eventpattern or Eventbridge or CloudWatch events to trigger your functions to do all kinds of automated detection for changes that you don't expect to happen, to security checks and things like that. Or even just basic things like automating, doing some processes and resources that are no longer necessary.

There's tons of things that a lot of DevOps teams are doing now that they would have been really difficult to do in the past without Lambda. And I do see a lot of adoption in that particular space as well.

Jeremy: Awesome. All right. So I want to move a little bit past use cases, but I think maybe this ties into it. There are people who say, "Well, I can't use Lambda because it only runs for 15 minutes and I have ETL tasks that need to run longer jobs." Or, "I needed to do something like that." Or, "I have to have multiple jobs running together." Or something. And this new thing that seems to maybe have been sparked by Google Cloud Run, is this idea of serverless containers. I spoke with Brett McGowan about this and just the thinking behind that. And so obviously we have Fargate with AWS. So what are your thoughts on this idea of expanding the term of serverless to include things like Fargate and Cloud Run?

Yan: Well, listen, when I think about serverless, I don't think about specific technologies. I think in terms of their characteristics a technology has, in terms of the pay-per use pricing model, in terms of not having to worry about underlying structure and how it's going to scale and all of that. I think right now, Fargate is serverless in that, you don't have to worry about underlying infrastructure. There is two instances that your containers run on or the cluster, how to auto-scale them. But I guess what is missing right now is just the event programming model and the fact that this is not pay-per.

Jeremy: Pay-per use.

Yan: Pay-per use, yeah. But that's that. I think you will too get a lot of benefits that we enjoy from serverless technologies with Fargate already and it does eliminate a lot of the limitations that Lambda has. Also, I just don't think that Lambda is going to be ... we should not see Lambda as the silver bullets. Nothing ever used is going to be a silver bullet. So the fact that you've got something else that can allow you to run containerized loads very easily and minimize the amount of work that you have to do. Because, remember the whole thing about serverless is about removing undifferentiated heavy lifting. And a lot of that is around managing EC2 instances, configuring auto scaling groups and clusters and all of that. And the fact that you can get a lot of that away from my plate onto AWS with Fargate, and I think that is really good direction.

I'm not a purist in terms of the terms, all I care about is, what can I get from a technology? And from that particular standpoint, Fargate is quite close to what we get with other similar services. It Just would be nice if you can trigger Fargate with event triggers directly.

Jeremy: That's the big thing too. I think Tim Wagner has said this as well, where it's sort of like Lambda and Fargate are becoming closer and closer. For all intents and purposes, there's no reason why Linda can only run for 15 minutes other than it's a limit that AWS set. I mean they could run for an hour or 10 days if they are needed to. If they wanted to allow you to do that, they could add some sort of event triggering or some sort of event driven approach to Fargate. I mean you can start Fargate tasks now in a number of different ways. So there's a little bit of event driven, just not as clear as the Lambda stuff. As Lambda gets more of these server full type features and as Fargate gets more of these serverless features, is there maybe a point that they become the same thing?

Yan: Probably and hopefully. I think at that point, it'd be really confusing for people. But I think that that is ultimately where I hope we will get to. Whereby a a lot of the limitations that we currently have with Lambda is eliminated and a lot of benefits that we enjoy from Lambda but not available for as Fargate becomes available for Fargate. So it becomes more of a a choice in terms of, "Okay, what do I prefer working with? Do I have specific use cases that fits better with a containerized environment where I have more control of the infrastructure itself?" Then I use Fargate versus using Lambda. But in both cases, I can enjoy pretty much the same benefits. I think that would be a really good place to be.

Jeremy: Awesome. All right. So one of the things that I've been talking about a lot at the end of last year and it's something that I've been thinking about for awhile, is this thing that I call abstraction as a service. And it's probably an annoying term, but what I'm thinking of is, Lambda functions themselves are pretty easy. You created a Hello World one, fine, simple. You want to add an API gateway, you use the Serverless Framework or use SAM, it makes it very easy for you to get these simple examples up and running. But start adding in SQS queues, or EventBridge and Kinesis Streams and then understanding the sharding of Kinesis streams and how many concurrent Lambdas that you might need to have, and then the new ability for you to replicate the stream.

There's just a whole bunch of things that are happening there. And now suddenly your simple serverless upload a piece of code that is now completely dwarfed by the amount of configuration files you have to write and the understanding of all these different best practices. My sort of premise here or what I'm hoping to see, and I think this is something that serverless components are starting to do. And to a degree, the serverless application repository is starting to do is encapsulate these use cases or these outcomes and put them into something that's much more easily deployed.

Where you don't have to think about the 50 different components you might need to configure under the hood. You just say, "I want to build an API or a web hook that does this and that and whatever." And it's much easier to configure that with same default. So, we can talk about the serverless components thing. But really what I want to do is focus on the serverless application repository. Because you've done a bunch of apps. I think you've got 10 of them in there now. What are your thoughts on SAR?

Yan: I think SAR is a good idea. But the execution is still problematic right now. At least for my experience working with SAR both as a consumer and also as a publisher. So one of the things said that often stands out is that, with SAR, the integration path is not super clear. For example, as a consumer, to use SAR in my CloudFormation template is not just normal CloudFormation resource, this host servers application. It's not a native CloudFormation resource type, so you have to bring in the SAM Macro even if you're not using SAM. A few times when I had to do that with the server framework, it was just fine. I can bring in the SAR Macro, but it becomes a bit weird. And also AWS often talk about this idea of we should be doing lease privilege as a default, but then they want you to also just use a package, their profile, it's policies for your SAR applications.

Which means that, your application either have no enough permissions or have too much permissions. It's really hard to size and tailor your permissions to follow this privilege. But when you do the right thing, the discoveries in the console punishes you because someone had to take a box to find applications that are using custom IEMs which they're trying to do the right thing to give you lease privilege. Also I find a lot of the discoverability itself is also not that intuitive to use. When you trying to search something, it's giving you way more things than you're actually looking for.

If you look at some of the top applications in SAR right now, they're all Hello World or introduction to basic Alexa Skills example. There is a lot of example codes you can deploy to your account to have a look at how someone else is doing Alexa Skills as opposed to something that is actually truly useful. What that tells me is that, the AWS customer just don't really know what they can do with SAR.

Jeremy: Do you think it's a lack of incentive for people to publish those apps?

Yan: Part of that is that and Forrest wrote a blog post a while back where he argued that SAR being a marketplace of sorts, should be incentivizing companies and publishers in terms of putting out something that is not just a toy or example Codebase, but something that as a company, as an enterprise, I can actually have confidence. The point is being into my real production environment and know that it's been looked after. When there's issues, someone would actually be there to fix it and patch it rather than leaving me in a ditch. Because all I need is one experience like that to never want to touch anything in SAR ever again. So having some kind of a scheme where publishers can be financially rewarded by the resources that I will provision into my account, so AWS bill me for those resources so some of that learning can then be passed onto the publisher for the SAR application.

That way you hopefully would encourage more commercial companies to start to publish things that are commercially looked after, adhere to SOAs and guarantees your large enterprise customers will be willing and comfortable to actually deploy into their environment. The same way that when you look at AWS marketplace, where I'm buying some software that deploys EC2 instance, at least I have confidence that this is a commercial thing. It's not just someone's toy project that they may not look after when they find something more interesting to work on.

There's a bit of an image problem there for SAR in terms of what does it represent to the consumers. And if we want people to have faith in that, then we really need to do something about that. I think commercialization is one step towards that.

Jeremy: I wonder about that too, because I read Forrest Brazeal's post as well, and I thought that made a lot of sense. You have other open marketplaces or other open ecosystems. Just think about NPM for example. People use NPM packages all the time with probably no consideration as to how well some of them are maintained. So you already have people using those and running into certain problems like that. I think maybe because it's so specific to AWS and maybe it just doesn't seem as open source as something like NPM does in a sense. But I totally agree. I just don't know. Do people pay for some of these apps or are they paying more for the support of them?

Yan: I think that's an interesting point about NPM. But what I will say is that, the impact that a badly written SAR app can have on your organization is probably far greater than an NPM package. Because now you're talking about resources that are provision to AWS environment where they can ... If a malicious access for example, might be able to gain access to way more things, than say someone who's published a malicious NPM package of course can do a lot worse. We fear those dependencies too. Also a bad [inaudible 00:29:43] application can also just cost you a lot of money. Imagine you have someone deployed something to VPC with net gateway and start charging you 4 cents per gigabyte of data transferred, and then those-

Jeremy: Get expensive quickly.

Yan: ... can get very expensive really quickly. I think in terms of the impact it can be much greater. I will think twice about deploying something to SAR, whereas with NPM it's often just okay.

Jeremy: Maybe it was one of those things too, because I think you're right. You're deploying something that is actually going to cost you money directly. So you have some other costs of auditing and some of those things you might do with NPM packages, but certainly with this, you're deploying things into an account that could rack up serious bills. That might be one of those other things where SAR needs to go down this road of helping people understand exactly what types of resources they're provisioning and maybe cost estimates and things like that that could potentially help ease someone's mind. But I do agree. There needs to be more people flooding that marketplace with good tools that they can use, and without having some sort of backing I think that's kind of tough to achieve.

So speaking of other tools and other things that are available, the ecosystem that we have now for serverless frameworks and not serverless framework, but frameworks for serverless, I should say. Serverless framework being one of those, obviously SAM architect, Claudia.js. There is a lot of them now. There's ones for PHP, there's ones for Ruby on Rails, there's all kinds of these frameworks popping up. Pretty much every single one of them is doing the exact same thing.

It's taking some level of abstraction and compiling it down to CloudFormation or making a bunch of API calls to AWS. What are your thoughts? I know you're a big proponent of the serverless framework. You've done a ton of Serverless Framework plugins, and I know that you've done a lot of work with SAM as well. So just what are your thoughts on the overall ecosystem? What should people be using?

Yan: Personally I prefer Serverless Framework and I'm happy to go into details on why I think Serverless Framework does well compared to a lot of the other frameworks. I think that the Serverless Framework the biggest strength it has is the fact that it's got a great ecosystem of plugins that have support from the community. Pretty much anything that you run into, there's probably a plugin that can solve that problem for you or at least make your life a lot easier. Even when that's not a case, it's really easy for you to write a plugin yourself. I guess I'll complain about their documentation on how to write a plugin. I think the only two articles they have there is still from Anna from I think three years ago.

But once you learn what a plugin looks at like, it's fairly straightforward because you can do so much different things. You can make API course as part of the deployment cycle. You can transform the CloudFormation template. With SAM, it does a lot of things right out of the box, but the problem I have with SAM is that, when you don't agree with the decisions that SAM has taken, it's really hard for you to do anything about that. One time I was working with a client and we were using SAM and that's when SAM just introduced the IAM authentication for API gateway. But they were also changing how the IEM permission was passed through. So as a caller, I need to have the permission to evoke the function as well as the end point, which of course didn't make sense, it breaks obstructions and all of that, but there's no way for me to get out of that.

The only way I found was, I actually wrote a CloudFormation macro, deploy that and then change the template that SAM generate just to fit to that one tiny little thing. This is where having that flexibility gives you default like everybody else who is trying to do as well. But at the same time, give you a nice way out. I guess when it comes to framework, there's also this new CDK and [inaudible 00:33:59] which is a whole different paradigm where this lets you program with your favorite programming language. I have to say I'm not a fan of this approach. I think I can get the temptation that, "Oh, I like writing stuff in C Sharp, I like writing stuff in Java script and now I can use my favorite language to do everything."

But your preference to the language that you want to write, I don't think that should be very high in the list of criteria for choosing a good deployment framework. Things like, the fact that you can get the same infrastructure you have to reason in different ways, I think that is quite a dangerous thing. You can end up with arbitrary. The complex things that would have been a lot simpler if everyone just writes something JSON or YAML. That said, I do wish there's better tools for YAML. I see so many people struggle with just basic indentation problems. It happens all the time.

For me, I came from F Sharp and Python as well. [inaudible 00:35:07] methods. I kind of learned that, but most of what haven't. You have to be trained to look out for these kind of problems, and we do need better tooling support for YAML. That said, I still think YAML or something like that is a better way to define your resource graph compared to writing a code to do that. I remember before all these frameworks, I was writing bash scripts to provision everything and now I'm just substituting bash with C Sharp or a prettier looking language. And I don't think that's the right approach.

Jeremy: I actually agree with you on the CDK stuff. I know some people are huge fans of it and they like the idea that you can build these constructs and then you can build higher level constructs that wrap a bunch of things together. And it is kind of cool that you can encapsulate some of that stuff, but I do feel like there is that black box issue there, and maybe with Winston Churchill who said that, democracy is the worst form of government except for every other form of government or something like that. I would probably say YAML is the worst form of configuration language except for every other form or every other configuration language.

All right, so the serverless framework, they just came out with the Serverless Framework Pro. And I know you've kind of experimented a little bit with that, but what are your thoughts on that? Now that they've added things like monitoring and CICD and some of that other stuff?

Yan: I think it's a nice tool for someone who's new to serverless and just wants to have something that they can use. But it's certainly from the monitoring perspective, I don't think the Serverless Pro holds up compared to other more specialized solutions that offers monitoring and tracing that you can tell are done by people who are spinning this view for a very long time and understand this problem space. What I find with the Serverless Pro offering is that, it gives you a lot of basics, but it doesn't do much more beyond what you get with CloudWatch.

So as someone who's got a lot of experience with AWS and have used CloudWatch for many, many years, I don't see a lot of value add for me to invest into Serverless Pro. But at the same time, if I'm new to AWS and new to serverless having something that comes out of the box with the two that you need to use for deployment, I can definitely understand the temptation there. For a lot of applications, I've done where it gets complicated quickly. You've got some of the functions, lots of event triggers, lots of events flying everywhere. And I'm really interested in the tracing side of things and that's why I think a lot of tools that we have today, it's not quite there yet.

Everything seem to struggle for tracing. EventBridge for a moment and also X-Ray for example, it doesn't trace through SQS properly, it doesn't trace through Kinesis at all. We get all these fragments of our transaction, but you can see that this space has been evolving really, really quickly. You've looked at the work that Lumigo has done, AppScan has done and Thundra has done. Everyone has gone through a lot improvement over the last 12 months at least. And I do see this space are getting more mature and more of the, I guess traditional big monitoring companies getting into this space as well.

And also a shout out to Honeycomb as well. I think they also do a very good job with their product. It's quite a big mind shift for people who are not used to do event based monitoring. But once you have that, it's really powerful. Splunk has been there for a long time, but they kind of price everybody out.

Jeremy: Listen, I think you're right about the monitoring component of Serverless Pro. It is good. I played around with it and it does tell you about your invocations and things like that, but this idea of really understanding the tracing and some of that deeper stuff is a little bit more advanced. But I will say, Serverless Framework has been great at developer tooling. That's one thing that they've done really well. And I think the greatest feature of Serverless Pro, at least for me is the new CICD deployment stuff that they've released. They've got similar to what Seed.run did with being able to use the monorepo.

It's very hard to have multiple repos when you are building serverless apps especially if your services are relatively small. Sometimes that monorepo make sense and being able to just deploy changes from individual directories I think is a pretty interesting thing. But anyways. All right. What about your thoughts on this? Because this is another thing we hear all the time and it kind of drives me nuts as when we hear the term multi-cloud. And that people are trying to ... You actually mentioned it earlier where you were hedging your bets to say how easy is it for me to move from AWS to some other provider as that's something that we actually care about. Do you see using a framework like SAM as potentially locking you in even more to AWS or do you think that's a pointless argument?

Yan: I think it's more of a pointless argument. Even the tools that do support multi-cloud, they have different syntax in the same way that the Terraform have got different syntax for different cloud providers but give you a consistent tooling experience when you use it with different cloud providers, but you still have to learn the different syntax. You have to learn the cloud itself. What resources is available in AWS versus what resources is available in GCP or your [inaudible 00:40:55]. Serverless framework, it does support multiple clouds but at the same time I think it's not as valuable as the people probably make it out to be. Because again, how you work with different clouds is completely different syntax and different resource type.

It'd give you consistent tooling experience in terms of using SOS deploy if something happens, but it doesn't remove the fact that that's no way you're going to struggle with when you want to go from one car to another. There's been so many different blog posts on this, I've written a few of myself as well. And I think this whole multi-cloud thing is an argument about, for example, when I buy an iPhone, they take our insurance, but how much I pay for insurance versus the cost of the phone itself. If the insurance itself is going to cost way more than just getting a new phone, then why we're not wise to do that but at the same time you look at some of the vendor lock in arguments. Well, when you decide firstly it's not lock-in, you can still move things, just there's a cost to moving. They're coupling, so there's a cost of moving.

You either deal with that when that scenario comes up or you try to do a lot of work upfront. So essentially investing all the work that you will have to do later to this point when you don't even know what's going to happen in the future. And the worst thing is, you end up with a lot of complexity that you have to carry all the way and everything becomes more difficult if it becomes slower. Your developers have to work so much harder to do everything as opposed to just, make a decision, go with it and knowing the back of her head that if we need to move ever, this is other things that we need to think about and we need to do. I think Nick from Serverless Inc actually wrote a really good post about the fact that, moving compute is always easy. It's the data. Data is incentivized to stay where it is and accumulate as much as possible. So it doesn't matter how easy it is to move your APIs from one container to another in different clouds. Well, we're going to deal with the data because there's no exact replica of that in Dynamo DB.

There's no exact replica of the high replication data store. I think it's foolish to spend so much effort upfront to prevent something that is probably unlikely to happen. How much I spend on insurance should be proportional to the risk of my phone getting stolen, lost and also to the cost of the phone yourself as well. The same argument applies here, where a lot of this strategy is just insurance against the stolen phone.

Jeremy: I think I need to start hooking my guests up to a blood pressure monitor when I ask them the question about blocking. Actually it's very funny. I think people now, and I'm the same way when somebody asked me about this. I think I maybe do it just to get a rise out of people, but people get angry now about trying to defend this vendor lock-in thing. Because I think you're absolutely right. And the biggest concern that I have where people play this vendor lock-in argument. You're locked into everything. You're locked into your iPhone, you're locked into Microsoft Word or whatever if that's what you choose to devout your time.

Yan: Anything you use.

Jeremy: You're locked into these things. I look at it and I say, if people are using that as an argument to use the tools or picking the technology that's the lowest common denominator, then they're not choosing the best tool for the job. I think that is something that significantly impacts the ability for people to adopt serverless because they say, "Well, if I write a Lambda function, I can't just easily move that to Azure. I can't easily do that to GCP. I need to work with all of those constraints." But honestly, I think if anything, you're just adding more work for yourself. And you're right, you're insuring yourself against something that is very unlikely to happen. And in the off chance that it does happen, I still think you're going to go through a massive exercise in order to migrate something no matter how low the denominator was that you chose.

Yan: We went through all without with ORMs. There was a few years where there was ORM every single month.

Jeremy: I'm going on the record. I hate ORMs. I hate them.

Yan: Because when we do have to move to a database, it turns out ORM doesn't really help me. It's just another thing you got to deal with as part of the migration process.

Jeremy: Something new to learn.

Yan: And also it gets in your way from the start in terms of the complexity to start but also when you want to do anything, you've got to have to understand what happens under the hood but then also how to do it with ORM.

Jeremy: Exactly.

Yan: It's crazy.

Jeremy: And the optimization isn't there. You don't get the optimizations with an ORM. The biggest thing that drives me nuts about them is that, you write a query and then you have some ORM or whatever that has to run three separate queries in order to merge the data back in the application layer because that's how it was built, where you could have just written a native query and done a join or something like that, and it would have been a thousand times more efficient. But anyways, yeah. You mentioned Terraform as well when you were talking. What are your thoughts on Terraform for serverless deployments?

Yan: It's very laborious and painful. I remember on my previous jobs I was convinced by the teams to use a Serverless Framework, and all I had to do was show them a very simple API gateway endpoints with a Lambda function and there was three lines of code in the server framework. It was about 150 lines of Terraform scripts. And you can see the teams that are using the serverless framework, they just go in there and get it done. Get a feature shipped and test it and all that. Other teams would be spending next two weeks just writing Terraform script. I had engineers coming up to me to describe their job. We spend about 60% of our time just writing Terraform. When you are talking about serverless being don't do undifferentiated heavy lifting. Something is not quite right when most of your time you're just writing infrastructure.

Jeremy: That's the thing too with Terraform. Terraform is a very good product and there's all [crosstalk 00:47:28] Terraform the enterprise edition has a lot of great things like safeguards and some of those other things. I think it is a very good tool for cloud management but at the same time, I think you're right, not very productive for the serverless developer.

Yan: No. If I'm provisioning VPCs and networking and things like that, I'm very happy to use Terraform. It is a very good tool for that. But when I just wanted to write a few Lambda functions and hook up a few end points and have some event stores like SNS, SQS and so on, I really don't need Terraform, what I need is something that can give me good defaults and allow me to do what I need to do and get out and move on to the next thing rather than having to get bogged down with the detail of the specifics. That's just not productive. That's not useful. That's just undifferentiated heavy lifting.

Jeremy: You are preaching to the choir. All right, let's move on to, where is this going? Serverless in general. This is one of those things where I think you and I would agree that, and I think you mentioned it earlier, it's like we're making it more complicated. We're adding new features, the learning curve keeps getting steeper and steeper. There are still some use cases that are not necessarily perfect for it. AWS is making advancements in some of those things. Reducing VPC cold starts, adding things like RDS Proxy and provision concurrency and those sort of things. But are there other things that are holding serverless back? Does there need to be some other breakthrough before it goes mainstream?

Yan: I don't know about the major breakthrough, but I definitely think more education and more guidance, not just in terms of what these features do, but also when to use them and how to choose between different event triggers. That's a question I get all the time. "How do I decide when to use API gateway versus AOB? How do I choose between SNS, SQS, Kinesis, DynamoDB Streams, EventBridge, IoT Core. That's just six application integration services off the top of my head. There's just no guidance around any of that stuff and it's really difficult for someone new coming into this space to understand all the ins and outs and trade offs between SNS and SQS and Kinesis and so on.

Having more education around that, having more official guidance from AWS around that, that would be really useful. In terms of technology wise, I think I like the trajectory that AWS has been on. No flashy new things but rather continuously solving those day to day annoyances, the day to day problems that people run into. The whole cold start thing, again, often overplayed, often underplayed it's never as good as some people say, it's never as bad as some other people say. But having some solutions for people with real problems, where with clold starts we speak of various different reasons.

I really like what you've done with provision concurrency, even if I think the implementation is still, I guess it's a version one. So hopefully some of the kinks that they currently have would be solved. Other than that, I'd like to see them do more with some multi account management side of things. A control tower is great, but again, there's a lot of clicking stuff in the console to get anything set up, and it's also very easy to rack up a pretty big bill if you're not careful you can provision a lot.

NAT gateway for example and things like that. One of the companies I've been talking to recently as well, a Dutch bank, they are actually doing some really cool tool themselves to essentially give you infrastructure as codes. Think of it as a CloudFormation extension that allows you to capture your entire org. Imagine I have a resource type that's defines my org and the different accounts and then when they configure CloudTrail set up for multi-cloud to configure security guard and things like that all within my cell template, which looks just like CloudFormation. So some really amazing tool that those guys have built.

But having something like that from AWS would be pretty amazing as well. Because again, we've seen more and more people getting to the point where they have a very complex ecosystem of lots of different enterprise accounts, managing them and setting up the right things. The STPs and things like that. It's not easy and we certainly don't want people to be constantly going to the console and clicking things. And that's another annoyance I constantly have with AWS documentations is, they keep talking about infrastructure as codes, but every single documentation just tell us, go to this console, click this button.

Jeremy: That's how you do it in the console. Exactly.

Yan: What the hell?

Jeremy: Yeah, exactly. I guess one of the things that I try to tell people who ask me to get into the cloud or to start building stuff in serverless is sort to do a slow migration pattern. You can't just jump all in, you can't rewrite everything in serverless and do that. Often though that does require rewriting applications. Do you see a potential path where making it easier to move those applications into Lambda or into Fargate maybe like if there was an easier path to lift and shift, would that be something you think would make sense?

Yan: I think that would make sense. I guess I'll have to wait and see what kind of execution that comes from that. Because again, you're making a lot of assumptions about what people are using, what they're doing to be able to do that well. Of course if you do that, it's really easy to do them bad. I kind of think that would be great, but it really depends on the execution.

Jeremy: Awesome. All right, so any other missing pieces in serverless? I think you and I agree we need some sort of Elasticsearch utility.

Yan: Absolutely.

Jeremy: But anything else you can think of that's maybe missing?

Yan: Let's see. Nothing off the top of my head. But definitely some kind of serverless Elasticsearch that would be awesome.

Jeremy: Awesome. All right. So final question here. Because now that I have you and I think that with everything that you write with the courses that you do and you're doing a ton of in-person workshops and things like that and all of your talks, everything you do is very, very good advice. And I think you've been a serverless hero for quite some time. So just maybe we can capture, if people are interested in moving to serverless, what is your one sentence or it can be a little longer. How would you suggest people make that first step into serverless?

Yan: Subscribe to this newsletter that I heard it's because something like Off-by-none. It's a really good way to just get regular newsletters about all kinds of different content.

Jeremy: I did not pay you to say that. I just want to make sure that's clear.

Yan: But yeah, definitely. That's one of I think one of the dangers of having Lambda being deceptively simple is that, there's still a lot of things you have to learn. There's still a lot of things you have to understand too. You can make really bad mistakes. We keep reading on the web about horror stories, but a lot of that is because of the lack of research, and I think Joe Emerson said it really well that, if you spend two weeks researching and two days doing work, you're probably going to end up better off than if you do two days of research and two weeks of work.

Jeremy: Yes, I totally agree.

Yan: So that you don't make all these mistakes. But in terms of actual advice, I think we share to people in the community. People like you, me, Ben Kehoe, or others, we are all very happy to help and do some research and if you're stuck, just ask us questions. We're all very keen to see a world where human productivity is not wasted on setting up servers and managing them. We'd be very happy to help you. So we'll help you get started the right way.

Jeremy: Awesome. All right. Well, thank you again so much for joining me and sharing all of this serverless knowledge with everyone in the community and obviously the things that you continuously do to help people learn and educate people on serverless. If people want to find out more about you, how would they do that?

Yan: They can go to theburningmonk.com or follow me on Twitter as @theburningmonk.

Jeremy: And you've got a bunch of courses and open-source projects that you work on. Those are all available on theburningmonk.com.

Yan: Yeah, yeah. A bunch of courses you can find under the courses heading. There's also a bunch of in-person workshops I'm doing this year and also just lots and lots of blog posts.

Jeremy: Awesome. All right, well, we will get all that into the show notes. Thanks again.

Yan: Thank you. Thanks for having me.

Episode source