Forem

Serverless Chats

Episode #104: The Rise of Data Services with Patrick McFadin

About Patrick McFadin

Patrick McFadin is the VP of Developer Relations at DataStax, where he leads a team devoted to making users of Apache Cassandra successful. He has also worked as Chief Evangelist for Apache Cassandra and consultant for DataStax, where he helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.

Twitter: @PatrickMcFadin
LinkedIn: Patrick McFadin
DataStax website: datastax.com
K8ssandra: k8ssandra.io
Stargate: stargate.io
DataStax Astra: Cassandra-as-a-Service

Watch this episode on YouTube: https://youtu.be/-BcIL3VlrjE

This episode sponsored by CBT Nuggets and Fauna.

Transcript
Jeremy: Hi everyone, I'm Jeremy Daly and this is Serverless Chats. Today I'm chatting with Patrick McFadin. Hey Patrick, thanks for joining me.

Patrick: Hi Jeremy. How are you doing today?

Jeremy: I am doing really well. So you are the VP of Developer Relations at DataStax, so I'd love it if you could tell the listeners a little bit about yourself and what DataStax is all about.

Patrick: Sure. Well, I mean mostly I'm just a nerd with a cool job. I get to talk about technology a lot and work with technology. So DataStax, we're a company that was founded around Apache Cassandra, just supporting and making it awesome. And that's really where I came to the company. I've been working with Apache Cassandra for about 10 years now. I've been a part of the project as a contributor.

But yeah, I mean mostly data infrastructure has been my life for most of my career. I did this in the dotcom era, back when it was really crazy when we had dozens of users. And when that washed out, I'm like, oh, then real scale started and during that period of time I worked a lot in just trying to scale infrastructure. It seems like that's been what I've been doing for like 30 years it seems like, 20 years, 20 years, I'm not that old. Yeah. But yeah, right now, I spend a lot of my time just working with developers on what's next in Kubernetes and I'm part of CNCF now, so yeah. I just can't to seem to stay in one place.

Jeremy: Well, so I'm super interested in the work that DataStax is doing because I have had the pleasure/misfortune of managing a Cassandra ring for a start-up that I was at. And it was a very painful process, but once it was set up and it was running, it wasn't too, too bad. I mean, we always had some issues here and there, but this idea of taking a really good database, because Cassandra's great, it's an excellent data store, but managing it is a nightmare and finding people who can manage it is sort of a nightmare, and all that kind of stuff. And so this idea of taking these services and DataStax isn't the only one to do this, but to take these open-source services and turn them into these hosted solutions is pretty fantastic. So can you tell me a little bit more, though? What this shift is about? This moving away from hosting your own databases to using databases as a service?

Patrick: Yeah. Well, you touched on something important. You want to take that power, I mean Cassandra was a database that was built in the scale world. It was built to solve a problem, but it was also built by engineers who really loved distributed computing, like myself, and it's funny you say like, "Oh, once I got it running, it was great," well, that's kind of the experience with most distributed databases, is it's hard to reason around having, "Oh, I have 100 mouths to feed now. And if one of them goes nuts, then I have to figure it out."

But it's the power, that power, it's like stealing fire from the gods, right? It's like, "Oh, we could take the technology that Netflix and Apple and Facebook use and use it in our own stuff." But you got to pay the price, the gods demand their payment. And that's something that we've been really trying to tackle at DataStax for a couple of years now, actually three, which is how ... Because the era of running your own database is coming to an end. You should not run your own database. And my philosophy as a technologist is that proper, really important technology like your data layer should just fade into the background and it's just something you use, it's not something you have to reason through very much.

There's lots of technology that's like that today. How many times have you ... When was the last time you managed your own memory in your code?

Jeremy: Right. Right. Good point. I know.

Patrick: Thank god, huh?

Jeremy: Exactly.

Patrick: Whew.

Jeremy: But I think that you make a really good point, because you do have these larger companies like Facebook or whatever that are using these technologies and you mentioned data layers, which I don't think I've worked for a single company, I don't think I actually ... I founded a start-up one time and we built a data layer as well, because it's like, the complexity of understanding the transaction models and the routing, especially if you're doing things like sharding and all kinds of crazy stuff like that, hiding that complexity from your developers so that they can just say, "I need to get this piece of information," or, "I need to set this piece of information," is really powerful.

But then you get stuck with these data layers that are bespoke and they're generally fragile and things like that, so how is that you can take data as a service and maybe get rid of some of that, I don't know, some of that liability I guess?

Patrick: Yeah. It's funny because you were talking about sharding and things like that. These are things that we force on developers to reason through, and it's just cognitive load. I have an app to get out, and I have some business desire to get this application online, the last thing I need to worry about is my sharding algorithm. Jeremy, friends don't let friends shard.

Jeremy: Right. That's right. That's a good point.

Patrick: But yeah, I mean I think we actually have all the parts that we need and it's just about, this is closer than you think. Look at where we've already started going, and that is with APIs, using REST. Now GraphQL, which I think is deserving its hotness, is starting to bring together some things that are really important for this kind of world we want to live in. GraphQL is uni-fettering data and collecting and actual queries, it's a QL, and why they call it Graph, I have no idea. But it gives you this ability to have this more abstract layer.

I think GraphQL will, here's a prediction is that it's going to be like the SQL of working with data services on the internet and for cloud-native applications. And so what does that mean? Well, that means I just have to know, well, I need some data and I don't really care what's underneath it. I don't care if I have this field indexed or anything like that. And that's pretty exciting to me because then we're writing apps at that point.

Jeremy: Right. Yeah. And actually, that's one of the things I really like about GraphQL too is just this idea that it's almost like a universal data access layer in a sense because it does, you still have to know it, you have to know what you're requesting if you're an end developer, but it makes it easier to request the things that you need and have those mutations set and have some of those other things standardized across the company, but in a common format because isn't that another problem? Where it's like, I'm working with company A and I move to company B maybe and now company B is using a different technology and a different bespoke data layer and some of these other things.

So, I think data as a service for one, maybe with GraphQL in front of it is a great way to have this alignment across companies, or I guess, just makes it easier for developers to switch and start developing right away when they move into a new company.

Patrick: Yeah, and this is a concept I've been trying to push pretty hard and it's driven by some conversations I've had with some friends that they're engineering leaders and they have this common desire. We want to have a zero day dev, which is the first day that someone starts, they should be producing production code. And I don't think that's crazy talk, we can do this, but there's a lot of things that are in front of it. And the database is one of them. I think that's one of the first things you do when you show up at company X is like, "Okay, what database are you using? What flavor of SQL or GRPC or CQL, Cassandra query language? What's the data model? Quick, where's that big diagram on the wall with my ERD? I got to go look at that for a while."

Jeremy: How poorly did you structure your Git repositories? Yeah.

Patrick: Yeah, exactly. It's like all these things. And no, I would love to see a world where the most troublesome part of your first day is figuring out where the coffee and the bathroom are, and then the rest of it is just total, "Hey, I can do this. This is what I get paid to do."

Jeremy: Right. Yeah. So that idea of zero day developer, I love that idea and I know other companies are trying to do that, but what enables that? Is it getting the idea of having to understand something bespoke? Is it getting that off of the table? Or not having to deal with the low-level database aspect of things? I mean because APIs, I had this conversation with Rob Sutter, actually, a couple weeks ago. And we were talking about the API economy and how everything is moving towards APIs. And even data, it was around data as well.

So, is that the interface, you think, of the future that just says, "Look, trying to interface directly with a database or trying to work with some other layer of abstraction just doesn't make sense, let's just go straight from code right to the data, with a very simple API interface?"

Patrick: Yeah, I think so. And it's this idea of data services because if you think of if you're doing React, or something like a front-end code, I don't want to have a driver. Drivers are a total impediment. It's like, driver hell can be difficult at large organizations, getting the matching right. Oh, we're using this database so you have to use this driver. And if you don't, you are now rejected at the gate. So it's using HTTP protocols, but it's also things like when you're using React or Angular, View, whatever you're using on the front-end, you have direct access.

But most times what you're needing is just a collection or an object. And so just do a get, "I need this thing right now. I'm doing a pick list. I need your collection." I don't need a complicated setup and spend the first three days figuring out which driver I'm using and make sure my Gradle file is just perfect. Yeah. So, I think that's it.

Jeremy: Yeah. No, I'd be curious how you feel about ORMs, or O-R-Ms, certainly for relational databases, I know a lot of people love them. I can't stand them. I think it adds a layer of abstraction and just more complexity where I just want access to the database. I want to write the query myself, and as soon as you start adding in all this extra stuff on top of it to try to make it easier, I don't know, it just seems to mess it up for me.

Patrick: All right. So yeah, I think we have an accord. I am really not a fan of ORMs at all. And I mean this goes back to Hibernate. Everyone's like, "Oh, Hibernate's going to be the end of databases." No, it's not. Oh yeah, it was the end of the database at the other side because it would create these ridiculous queries. It's like, why is every query a full table scan?

Jeremy: Exactly.

Patrick: Because that's the way Hibernate wanted it. Yeah. I actually banned Hibernate at one company I was working at. I was Chief Architect there and I just said, "Don't ever put Hibernate in our production." Because I had more meetings about what it was doing wrong than what it was doing right.

Jeremy: Right. Right. Yeah. No, that's sounds, yeah.

Patrick: Is that a long answer? Like, no.

Jeremy: No, I've had the same experience where certain ORMs you're just like, no. Certain things, you can't do this because it's going to one, I think it locks you in in a sense, I mean there's all kind of lock-in in the cloud, and if you're using a data service or an API or you're using something native in AWS, or IBM Cloud, you're still going to be locked in in some way, but I do feel like whenever you start going down that path of building custom things, or forcing developers to get really low level, that just builds up all kinds of tech debt, right? That you eventually are going to have to work down.

Patrick: Well, it's organizational inertia. When you start getting into this, when you start using annotations in Hibernate where you're just cutting through all the layers and now you're way down in the weeds, try to move that. There's a couple of companies that I've worked with now that are looking at the true reality of portability in their data stores. Like, "Oh, we want to move from one to a different, from a key value to a document without developers knowing." Well, how do you get to that point?

Jeremy: Right. Yeah.

Patrick: And it's just, that's not giving access to those things, first of all, but this is that tech debt that's going to get in your way. We're really good, technologists, we're really good at just wracking up the charges on our tech debt credit card, especially whenever we're trying to get things out the door quickly. And I think that's actually one of the problems that we all face. I mean, I don't think I've ever talked to a developer who was ahead of schedule and didn't have somebody breathing down their neck.

Jeremy: Very true.

Patrick: You take shortcuts. You're like, "We've got to shift this code this week. Skip the annotations and go straight into the database and get the data you need." Or something. You start making trade-offs real fast.

Jeremy: What can we hard code that will just get us past.

Patrick: Yeah. Is it green? Shift it. Yeah.

Jeremy: Yeah, no, I totally, totally agree. All right. So let's talk a little bit more about, I guess, skillsets and things like that. Because there are so many different databases out there. Cassandra is just one and if you're a developer working just at the driver level, I guess, with something like Cassandra, it's not horrible to work with. It's relatively easy once a lot of these things are set up for you.

Same is true of MongoBD, or I mean, DynamoDB, or any of these other ones where the interface to it isn't overly difficult, but there's always some sort of something you want to build on top of it to make it a little bit easier. But I'm just curious, in terms of learning these different things and switching between organizations and so forth, there is a cognitive load going from saying, "I'm working on Cassandra," to going to saying, "I'm working on DynamoDB," or something like that. There's going to be a shift in understanding of how the data can be brought back, what the limitations are, just a whole bunch of things that you kind of have to think about. And that's not even including managing the actual thing. That's a whole other thing.

So, hiring people, I guess, or hiring developers, how much do we want developers to know? Are you on board with me where it's like, I mean I like understanding how Cassandra works and I like understanding how DynamoDB works, and I like knowing the limits, but I also don't want to think about them when I'm writing code.

Patrick: Yeah. Well, it's interesting because Cassandra, one of the things I really loved about Cassandra initially was just how it works. As a computer scientist, I was like, "This is really neat." I mean, my degree field is in distributed computing, so of course, I'm going to nerd out.

Jeremy: There you go.

Patrick: But that doesn't mean that it doesn't have mass appeal because it's doing the thing that people want. And I think that's going to be the challenge of any properly built service layer. I think I've mentioned to you before we started this, I work on a project called Stargate. And Stargate is a project that is meant to build a data layer on top of databases. And right now it's with Cassandra. And it's abstracting away some of the harder to understand or reason things.

For instance, with distributed computing, we're trying to reduce the reliance on coordination. There is a great article about this by Pat Helland about how coordination is the last really expensive thing that we have in development. Memory, CPU, super cheap. I can rent that all day long. Coordination is really, really hard, and I don't expect a new programmer to understand, to reason through coordination problems. "Oh, yeah, the just in time race conditions," and things like that.

And I think that's where distributed computing, it's super powerful, but then whenever people see what eventual consistency are, they freak out and they're like, "I just want my SQL Lite on my laptop. It's very safe." But that's not going to get you there. That's not a global database, it's not going to be able to take you to a billion users. Come on, don't cut ...

Jeremy: Maybe you don't need to be.

Patrick: ... your apps short Jeremy. You're going to have a billion users.

Jeremy: You should strive for it, at least, is how I feel about it. So that's, I guess, the point I was trying to get to is that if the developers are the ones that you don't want learning some of this stuff, and there's ways to abstract it away again, going like we talked about data as a service and APIs and so forth. And I think that's where I would love to see things shifting. And as you said earlier, that's probably where things are going.

But if you did want to run your own database cluster, and you wanted to do this on your own, I mean you have to hire people that know how to do this stuff. And the more I see the market heating up for this type of person, there is very, very few specialists out there that are probably available. So how would you even hire somebody to run your Cassandra ring? They probably all work at DataStax.

Patrick: No, not all of them. There's a few that work at Target and FedEx, Apple, the biggest Cassandra users in the world. Huawei. We just found out lately that Huawei now has the biggest cluster on the planet. Yeah. They just showed up at ApacheCon and said, "Oh yeah, hold my beer." But I mean, you're right, it's a specialized skillset and one of the things we're doing at DataStax, we feel, yeah, you should just rent that. And so we have Astra, which is our database as a service.

It's fully compatible with open-source Cassandra. If you don't like it, you can just take it over and use open-source. But we agree and we actually can run Cassandra cheaper than you can, and it's just because we can do it at scale. And right now Astra, the way we run it is truly serverless, you only pay for what you need, and that's something that we're bringing to the open-source side of Cassandra as well, but we're getting Cassandra closer to Kubernetes internally.

So if you don't want to think about Kubernetes, if you don't want to think about all that stuff, you can just rent it from us, or you could just go use it in open-source, either way. But you're right. I mean, it should not be a 2020s skillset is, "Get better at running Cassandra." I think those days should be, leave it to, if you want to go work at DataStax and run Cassandra, great, we're hiring right now, you will love it. You don't have to. Yeah.

Jeremy: So the idea of it being open-source, so again, I'm not a huge fan of this idea of vendor lock-in. I think if you want to run on AWS Lambda, yeah, most of what you can do can only run on AWS Lambda, but changing the compute, switching that over to Azure or switching that over to GCP or something like that, the compute itself is probably not that hard to move, right? I think especially depending on what you're doing, setting up an entire Kubernetes cluster just to run a few functions is probably not worth it. I mean, obviously, if you've got a much bigger implementation, that's a little different.

But with data, data is just locked in. No matter where you go, it is very hard to move a lot of data. So even with the open-source flair that you have there, do you still see a worry about lock in from a data side?

Patrick: Yeah. And it's becoming more of a concern with larger companies too, because options, #options. There was a pretty famous story a few years ago where the CEO of Target said, "I am not paying Amazon any more money," and they just picked up shop and moved from AWS to Google Cloud. And the CEO made a technical decision. It was like everybody downstream had to deal with that. And I think that luckily Target's a huge Cassandra shop and they were just like, "Okay, we'll just move it over there."

But the thing is that you're right, I mean, and I love talking about this because back when cloud was first starting and I was talking about it and thinking about it, just what do the clouds promise you? Oh, you get commodity scale of CPU and network and storage. And that's what they want to sell you because that what they're building. Those big buildings in north Virginia, they are full of compute network and storage, but the thing they know they need to hook you in and the way that they're hooking you in, there's some services that are really handy, they're great, but really the hook is the data.

Once you get into the database, the bespoke database for the cloud, one of the features of that database is it will not connect to any other database outside of that cloud, and they know that. I mean, and this is why I really strongly am starting to advocate this idea of this move towards data on Kubernetes is a way where open-source gets to take back the cloud. Because now we're deploying these virtual data centers and using open-source technology to create this portability. So we can use the compute network and storage, a Google, Amazon, Azure, OnPrem wherever, doesn't matter.

But you need to think of like, "All right. How is that going to work?" And that's why we're like, "If you rent your Cassandra from DataStax with Astra, you can also use the open-source Cassandra as well." And if we aren't keeping you happy, you should feel totally fine with moving it to an open-source workload. And we're good with that. One way or the other, we would love for you to use a database that works for you.

Jeremy: Right. And so this Stargate project that you're working on, is that the one that allows you to basically route to multiple databases?

Patrick: That's the dream. Right now it just does Cassandra, but there's been some really interesting ... There's some folks coming out of the woodwork that really want to bring their database technology to Stargate. And that's what I'm encouraged by. It's an open-source project, Stargate.io, and you can contribute any of the connectors for underlying data store, but if we're using GraphQL, if you're using GRPC, if you're using REST, the underlying data store is really somewhat irrelevant in that case. You're just doing gets and puts, or gets and sets. Gets and puts, yeah, that's right. Gets, sets, puts, it's a lot of words.

Jeremy: Whatever words. Yeah. Exactly.

Patrick: That's what I love about standard, Jeremy, there's so many to pick from.

Jeremy: Right, because there are ... Exactly, which standard do you choose? Yeah. So, because that's an interesting thing for me too, is just this idea of, I mean, it would be great to live in a perfect little cloud where you could say like, "Oh, well AWS has all the services I need. And I can just keep all my stuff there, whatever." But best of breed services, or again, the cost of hosting something in AWS maybe if you're hosting a Cassandra cluster there, versus maybe hosting it in GCP or maybe hosting it with you, you said you could host it cheaper than those could, or that we could host it ourselves.

And so I do think that there is ... and again, we've had this conversation about multi-cloud and things like that where it's not about agnostic, it's not about being cloud agnostic, it's about using the best of breed for any service that you want to use. And APIs seem to be the way to get you there. So I love this idea of the Stargate project because it just seems like that's the way where it could be that standard across all these different clouds and onto all these different databases, well I mean, right now Cassandra, but eventually these other ones. I don't know, that seems like a pretty powerful project to me.

Patrick: Well, the time has come. It's cloud native ... I work a lot with CNCF and cloud-native data is a kind of emerging topic. It's so emerging that I'm actually in the middle of writing a book, an O'Reilly book on it. So, yeah. Surprise. I just dropped it. This just in.

Yeah, because I can see that this is going to be the future, but when we build cloud-native, cloud applications, cloud-native applications, we want scale, we want elasticity, and we want self-healing. Those are the three cloud-native things that we want. And that doesn't give us a whole lot ... So if I want to crank out a quick REACT app, that's what I'm going to use. And Netlify's a great example, or Vercel, they're creating this abstraction layer. But Netlify and Vercel are both working, they've been partnering with us on the Stargate project, because they're seeing like, "Okay, we want to have that very light touch, developers just come in and use it," in building cloud-native applications.

And whenever you're building your application, you're just paying for what you use. And I think that's really key, not spinning up a bunch of infrastructure that you get a monthly bill for. And that bill can be expensive.

Jeremy: It seems crazy. Doesn't it seem crazy nowadays? Actually provisioning an EC2 instance and paying for it to run even if it does nothing. That seems crazy to me.

Patrick: There are start-ups around the idea of finding the instance that's running that's causing you money that you're not using.

Jeremy: Which is crazy, isn't it? It's crazy. All right. So let's go a little bit more into standards, because you mentioned standards. So there are standards now for a lot of things, and again, GraphQL being a great example, I think. But also from a database perspective, looking at things like TSQL and developers come into an organization and they're familiar with MySQL, or they're familiar with PostgreSQL, whatever it is. Or maybe they're familiar with Cassandra or something like that, but I think most people, at least from what I've seen, have been very, very comfortable with the TSQL approach to getting data. So, how do you bring developers in and start teaching them or getting them to understand more of that NoSQL feel?

Patrick: I think it's already happened, it's just the translation hasn't happened in a lot of minds. When you go to build an application, you're designing your application around the workflows your application's going to have. You're always thinking about like, "I click on this. I go there." I mean, this is where we wireframe out the application. At that point, your database is now involved and I don't think a lot of folks know that.

It's like, at every point you need to put data or get data. And I think this is where we've taught could be anybody building applications, which makes it really difficult to be like, "No, no, no, start with your data domain first and build out all those models. And then you write your application to go against those models." And I'll tell you, I've been involved in a few of these application boot camps, like JavaScript boot camps and things, they don't go into data modeling. It's just not a part of it.

Jeremy: Really?

Patrick: And I think this is that thing where we have to acknowledge like, "Yeah, we don't really need that anymore as much, because we're just building applications." If I build a React app, and I have a form and I'm managing the authentication and I click a button and then I get a profile information, I just described every database interaction that I need and the objects that I need. And I'm going to put my user profile at some point, I'm going to click my ID and get that profile back as an object. Those are the interactions that I need. At no point did I say, "And then I'm going to write select from where." No, I just need to get that data.

Jeremy: And I love thinking about data as objects anyways. It makes more sense, rather than rows of spreadsheets essentially that you join together, describing an object even if it's got nested data, like a document form or things like that, I think makes a ton of sense. But is SQL, is it still relevant do you think? I mean, in the world we're moving into? Should I be teaching my daughters how to write TSQL? Or would I be wasting my time?

Patrick: Yeah. Well, yes and no. Depends on what your kid's doing. I think that SQL will go to where it originally started and where it will eventually end, which is in data engineering and data science. And I mean, I still use SQL every once in a while, Bigtable, that sort of thing, for exploring my data. I mean for an analytics career or reporting data and things like that, SQL is very expressive. I don't see any reason to change that. But this is a guy who's been writing SQL for a million years.

But I mean, that world is still really moving. I mean, like a Presto and Snowflake and all these, Redshift, they all use Bigtable, they all use SQL to express the reporting capabilities. But ... And I think this is how you and I got sucked into this is like, well that was the database that we had, so we started using reporting languages to build applications. And how'd that work out?

Jeremy: Yeah. Well, it certainly didn't scale very well, I can tell you that, going back to sharding, because that is always something that was very hard to do. So I guess, I get the point that essentially if you're going to be in the data sciences and you actually need to analyze that data and maybe you do need to do joins, or maybe you need to work with big data in a way, that's a specialized aspect of it and I think people could dabble in that if they were just regular developers and they didn't want to go too deep.

But it sounds like the bigger, or the end goal here, maybe altruistic, is to just give people access to data. So even if they don't know SQL or they don't know something complex, just make it so that whatever data is there that anybody, with whatever level is, they can consume it.

Patrick: Yeah. And move fast with the thing that you're building. Actually, I use a Facebook term, but Facebook does do this. Internally there's a system called Occhio that provides gets and puts for your data, but it abstracts things like geographics and things like that. But the companies that are trying to move quickly, they understood this a long time ago. If you have to reason through, "Am I doing a full table scan? Is that an efficient interjoin?" If you have to reason through that, you're not moving fast anymore.

Jeremy: Right. Right. All right. Cool. All right, so let's talk about Astra a little bit more and this whole idea of, because Astra is the serverless version, the hosted version, the serverless version of Cassandra, right? Through DataStax?

Patrick: Right. And ...

Jeremy: Did I get that right?

Patrick: You got it right. And so it gives you full access. You could do Port 9042 if you still want to use a driver, but it gives you access via GraphQL, REST, and there's also a document API. So if you just want to persist your JavaScript API or JavaScript and then pull it back out your JSON, it does full documents. So it emulates what a MongoDB or DocumenDB does. But the important thing, and this is the somewhat revolutionary side of this, and again, this is something that we're looking to put into open-source, is the serverless nature of it.

You only pay for what you use. And when you want to create a Cassandra database, we don't even call it a Cassandra database on the Astra panel anymore. We just create a database. You give it a name. You click. And it's ready. And it will scale infinitely. As long as we can find some compute and network for you to use somewhere, it'll just keep scaling and that's kind of that true portion of serverless that we're really trying to make happen. And for me, that's exciting because finally, all that power that I feel like I've been hoarding for a long time is now available for so many more people.

And then if you do a million writes per second for 10 minutes and then you turn it off, you only pay for that little short amount of time. And it scales back. You're not paying a persistent charge forever.

Jeremy: I'm just curious from a technical implementation, because I'm thinking about PTSD or nightmares back of my days running Cassandra, and so I'm just trying to think how this works. Is it a shared tenancy model? Or is there a way to do single tenancy if you wanted that as a service?

Patrick: Under the covers, yes, it is multi-tenant, but the way that we are created ... so we had to do some really interesting engineering inside. So my RCO's going to kill me if I talk about this, but hey, you know what, Jeremy? We're friends, we can do this. He's like, "Don't talk about the underlying architecture." I'm talking about the underlying architecture. The thing that we did was we took Cassandra and we decomposed it into microservices mostly. That's probably, it's still Cassandra, it's just how we run it makes it way more amenable to doing multi-tenant and scale in that fashion where the queries are separated from the storage and things that are running in the background, like if you're familiar with Cassandra because it's a log structure storage, you ask to do compactions and things like that, all that's just kind of on the side. It doesn't impact your query.

But it gives us the ability to, if you create a database and all of a sudden you just hammer it with a million writes per second, there's enough infrastructure in total to cover it. And then we'll spin up more in the back to cover everything else. And then whenever you're done, we retract it back. That's how we keep our costs down. But then the storage side is separated and away from the compute side, and the storage side can scale its own way as well.

And so whenever you need to store a petabyte of Cassandra data, you're just storing, you're just charged for the petabyte of storage on disk, not the thousandth of a cluster that you just created. Yeah.

Jeremy: No. I love that. Thank you for explaining that though, because that is, every time I talk to somebody who's building a database or running some complex thing for a database, there's always magic. Somebody has to build some magic to make it actually work the way everyone hopes it would work. And so if anybody is listening to this and is like, "Ah, I'm just getting ready to spin up our own Cassandra ring," just think about these things because these are the really hard problems that are great to have a team of people working on that can solve this specific problem for you and abstract all of that crap away.

Patrick: Yeah. Well, I mean it goes back to the Dynamo paper, and how distributed databases work, but it requires that they have a certain baseline. And they're all working together in some way. And Cassandra is a share-nothing architecture. I mean you don't have a leader note or anything like that. But like I said, because that data is spread out, you could have these little intermittent problems that you don't want to have to think about. Just leave that to somebody else. Somebody else has got a Grafana dashboard that's freaking out. Let them deal with it. But you can route around those problems really easily.

Jeremy: Yeah. No, that's amazing. All right. So a couple more technical questions, because I'm always curious how some of these things work. So if somebody signs up and they set up this database and they want to connect to it, you mentioned you could use the driver, you mentioned you can use GraphQL or the REST API, or the Document API. What's the authentication method look like for that?

Patrick: Yeah. So, it's a pretty standard thing with tokens. You create your access tokens, so when you create the database, you define the way that you access it with the token, and then whenever you connect to it, if you're using JavaScript, there's a couple of collection libraries that just have that as one of the environment variables.

And so it's pretty standard for connecting the cloud databases now where you have your authentication token. And you can revoke that token at any time. So for instance, if you mistakenly commit that into your Git ...

Jeremy: Say GitHub. We've never done that before.

Patrick: No judging. You can revoke it immediately. But it also gives you our back, the controls over it's a read or write or admin, if you need to create new tables and that sort of thing. You can give that level of access to whatever that token is. So, very simple model, but then at that point, you're just interacting through a REST call or using any of the HTTP protocols or SQL protocol.

Jeremy: And now, can you create multiple tokens with different levels of permission or is it all just token gives you full access?

Patrick: No, it's multiple levels of protection and actually that's probably the best way to do it, for instance, if your CI/CD system, has the ability to, it should be able to create databases and tear them down, right? That would be a good use for that, but if you have, for instance, a very basic application, you just want it to be able to read and write. You don't want to change any of the underlying data structures.

Jeremy: Right. Right.

Patrick: That's a good layer of control, and so you can have all these layers going on one single database. But you can even have read-only access too, for ... I think that's something that's becoming more and more common now that there's reporting systems that are on the side.

Jeremy: Right. Right. Good.

Patrick: No, you can only read from the database.

Jeremy: And what about data backups or exporting data or anything like that?

Patrick: Yeah, we have a pretty rudimentary backup now, and we will probably, we're working on some more sophisticated versions of it. Data backup in Cassandra is pretty simple because it's all based on snapshots because if you know Cassandra the database, the data you write is immutable and that's a great way to start when you come to backup data. But yeah, we have a rudimentary backup system now where you have to, if you need to restore your data, you need to put in a ticket to have it restored at a certain point.

I don't personally like that as much. I like the self-service model, and that's what we're working towards. And with more granularity, because with snapshots you can do things like snapshot, this is one of the things that we're working on, is doing like a snapshot of your production database and restoring it into a QA cluster. So, works for my house, oh, try it again. Yeah.

Jeremy: That's awesome. No, so this is amazing. And I love this idea of just taking that pain of managing a database away from you. I love the idea of just make it simple to access the data. Don't create these complex things where people have to build more, and if people want to build a data access layer, the data access layer should maybe just be enforcing a model or something like that, and not having to figure out if you're on this shard, we route you to this particular port, or whatever. All that stuff is just insane, so yeah, I mean maybe go back to kind of the idea of this whole episode here, which is just, stop using databases. Start using these data services because they're so much easier to use. I mean, I'm sure there's concerns for some people, especially when you get to larger companies and you have all the compliance and things like that. I'm sure Astra and DataStax has all the compliance things and things like that. But yeah, just any final words, advice to people who might still be thinking databases are a good idea?

Patrick: Well, I have an old 6502 on a breadboard, which I love to play with. It doesn't make it relevant. I'm sorry. That was a little catty, wasn't it?

Jeremy: A little bit, but point well taken. I totally get what you're saying.

Patrick: I mean, I think that it's, what do we do with the next generation? And this is one of the things, this will be the thought that I leave us with is, it's incumbent on a generation of engineers and programmers to make the next generation's job easier, right? We should always make it easier. So this is our chance. If you're currently working with database technology, this is your chance to not put that pain on the next generation, the people that will go past where you are. And so, this is how we move forward as a group.

Jeremy: Yeah. Love it. Okay. Well Patrick, thank you so much for sharing all this and telling us about DataStax and Astra. So if people want to find out more about you or they want to find out more about Astra and DataStax, how do they do that?

Patrick: All right. Well, plenty of ways at www.datastax.com and astra.datastax.com if you just want the good stuff. Cut the marketing, go to the good stuff, astra.datastax.com. You can find me on LinkedIn, Patrick McFadin. And I'm everywhere. If you want to connect with me on LinkedIn or on Twitter, I love connecting with folks and finding out what you're working on, so please feel free. I get more messages now on LinkedIn than anything, and it's great.

Jeremy: Yeah. It's been picking up a lot. I know. It's kind of crazy. Linked in has really picked up. It's ...

Patrick: I'm good with it. Yeah.

Jeremy: Yeah. It's ...

Patrick: I'm really good with it.

Jeremy: It's a little bit better format maybe. So you also have, we mentioned the Stargate project, so that's just Stargate.io. We didn't talk about the K8ssandra project. Is that how you say that?

Patrick: Yeah, the K8ssandra project.

Jeremy: K8ssandra? Is that how you say it?

Patrick: K8ssandra. Isn't that a cute name?

Jeremy: It's K-8-S-S-A-N-D-R-A.io.

Patrick: Right.

Jeremy: What's that again? That's the idea of moving Cassandra onto Kubernetes, right?

Patrick: Yeah. It's not Cassandra on Kubernetes, it's Cassandra in Kubernetes.

Jeremy: In Kubernetes. Oh.

Patrick: So it's like in concert and working with how Kubernetes works. Yes. So it's using Cassandra as your default data store for Kubernetes. It's a very, actually it's another one of the projects that's just taking off. KubeCon was last week from where we're recording now, or two weeks ago, and it was just a huge hit because again, it's like, "Kubernetes makes my infrastructure to run easier, and Cassandra is hard, put those together. Hey, I like this idea."

Jeremy: Awesome.

Patrick: So, yeah.

Jeremy: Cool. All right. Well, if anybody wants to find out about that stuff, I will put all of these links in the show notes. Thanks again, Patrick. Really appreciate it.

Patrick: Great. Thanks, Jeremy.

Episode source