Forem: Momento

Momento added as an Amazon EventBridge API destination!

Rishti Gupta — Tue, 27 Aug 2024 21:58:21 +0000

Building a real-time weather update system with Momento, Amazon EventBridge, and DynamoDB

If you’re like many of us who use DynamoDB as a durable data store, you might have noticed that getting real-time updates when data changes can be challenging. DynamoDB streams gives you a way to monitor events in your database, but it can take a lot of code and infrastructure to handle them efficiently. Now, with Momento’s new partnership with Amazon EventBridge makes real-time updates easier than ever.

Yes, you heard it right—Momento is now officially an API Destination Partner with Amazon EventBridge! To celebrate this milestone, we’ve built a real-time weather update system using Amazon EventBridge, Momento, and DynamoDB. Let’s dive in and explore how it works!

Project overview

Our goal is to build a robust system that tracks weather information for various locations and deliver real-time updates. We are going to use a powerful trio—DynamoDB for storage, Amazon EventBridge for event handling, and Momento for caching and notifications.

Our sample application is designed to store weather information for different geographic locations in a DynamoDB table. Whenever there’s an update in the weather for a specific location, this change is immediately captured and propagated using a series of interconnected AWS services.

High-level architecture

To visualize how everything fits together, consider the architecture below.

When a weather record is updated in DynamoDB, a DynamoDB stream triggers an event that travels through EventBridge. This event is then routed to two endpoints in the Momento HTTP API, which are configured as API destinations in EventBridge: Momento Cache and Momento Topic. The cache can enhance performance and reduce costs by reducing database load, while the topic facilitates real-time notifications that you can deliver to other components in your application, or directly to browsers and mobile devices, without managing any of your own infrastructure.

Getting started with the demo

Check out our GitHub repository to quickly set up and run the web application.

If you’re interested in understanding the CDK code used to deploy the necessary resources, be sure to check out the dedicated page for an in-depth explanation.

Application Workflow

Open the Web Application: Launch the web application at http://localhost:5173. You’ll see a form designed to input weather data.
Enter Weather Data: Fill in the form with weather details for a specific location and submit it. This data is then stored in the DynamoDB table.

Data Written to DynamoDB: The weather data is securely stored in DynamoDB. Immediately, DynamoDB streams triggers an event that is sent to EventBridge.

EventBridge in Action: EventBridge processes the event triggered by the DynamoDB stream and forwards it to both the Momento Cache and Momento Topic. This is done using EventBridge Pipes, which are connected to the stream as the source and the Momento HTTP API destinations as the targets. The API destinations for Momento Cache and Topic are configured within EventBridge, enabling direct routing of the event to these endpoints.
Caching with Momento: The weather data is cached in Momento, reducing the load on DynamoDB for future reads and improving performance.

Real-Time Notifications: Meanwhile, Momento Topics broadcasts the event, simulating real-time notifications for any interested parties.

Delete Weather Record: You can also delete a weather record, which will be reflected in the cache and notifications.

Updated Momento Cache: The deleted record will also be removed from Momento Cache.

Remove Event in Momento Topics: Finally, a REMOVE event will appear in Momento Topic.

Wrapping Up

And there you have it—a real-time weather update system that seamlessly integrates Amazon EventBridge, Momento, and DynamoDB. By following this example, you’ve learned how to set up a write-through cache, reduce database load, and keep your users informed with real-time updates.

Ready to build your own weather update system? For more insights and resources, get started with Momento today or check out our docs to learn how to start building on Momento.

Additional Resources

Integrate Amazon DynamoDB Streams with Momento, via AWS EventBridge.

A spooky tale of overprovisioning with Amazon DynamoDB and Redis

Chris Price — Wed, 08 May 2024 21:50:59 +0000

(NOTE: This was originally published in October 2022 🙂👻)

Halloween is around the corner. Buckle up for a spooky engineering ghost story.

A few years ago, I worked as a software engineer at a large company building a video streaming service. Our first customer was a major professional sports league who would be using our service to broadcast a livestream of their games once a week to millions of viewers; an opportunity that was both exciting and terrifying!

‍When we signed on to the project, our service didn’t actually exist yet. But the league’s broadcast schedule certainly did. 🙂 The launch date was rock solid, and the service had to be able to handle all traffic being sent to us.

‍Is this where the scary part of the story begins? Nope! We had a fantastic engineering team and an architecture design we believed in. The schedule was tight, but we were confident we’d be able to hit our launch date. We put our heads down and got to work.

‍A few weeks before the first broadcast, we were feeling pretty good. The service was built, sans some finishing touches. The team was in the home stretch of load testing to make sure the service would hold up to the traffic at game time, and everything was business as usual.

‍But then…

‍We got our first realistic sample data set from our customer, and we integrated it into our load tests. It did not go smoothly. Based on our budget and our estimates for how much data we would need to store, we had configured a maximum read and write capacity for DynamoDB. But during the load test, we found that we were dramatically exceeding that capacity and running into DynamoDB throttles. Our service failed. Hard.

Be afraid. Be very afraid.

Uh oh. It’s only a few weeks until our first broadcast, and we have a major problem. In our architecture design, there were data we needed to store for each individual viewer watching the broadcast to keep track of where they were in the stream. We had decided to store this data in DynamoDB. After investigating the traffic that the broadcaster was sending us, we discovered the size of the payload for each viewer might be up to 10x larger than our estimates. This required 10x the IOPs on DynamoDB—and 10x the costs!

‍Our workload was very write-heavy. Some napkin math based on the observed 10x increase in data made it clear that storing it in Dynamo would put us far over budget. These data were ephemeral, so we decided that we could move them out of DynamoDB and into a cache server. We did some quick research on our options and decided to move forward with a managed Redis solution.

‍Managed Redis services have some nice benefits in that you aren’t explicitly responsible for provisioning and operating the individual nodes in your cache cluster. But, you *are* explicitly responsible for determining how many nodes you need in your cache cluster, and how big they need to be.

‍The next step was to write code to simulate the load that we would put on the Redis cluster, and run it... over and over again. We tested different sizes of nodes. We tested different cluster sizes. We tested different replication configurations. We tested. A lot.

‍All this writing of synthetic load tests to size a caching cluster was not work that we had accounted for in our engineering plans. Experimenting with different sizes (and types) of cache nodes, monitoring them to ensure they weren’t overloaded during the test runs… These tasks were expensive and time consuming—and largely ancillary to the actual business logic of the service we were trying to build. None of them were especially unique to us. But we still had to allocate precious engineering resources to them.

‍After a week, we had nailed down the sizing and configuration for our cluster, still racing against the clock. After another week, we had completed the work to migrate that part of our code off of Dynamo onto the Redis cluster.

‍And the service was up and running again.

It’s alive! It’s aliiive!

We did it! The first broadcast went smoothly. As with any major software project, after observing it in action in the real world, we learned some lessons and found some things to improve, but the viewers had a good viewing experience. We rolled out some of those improvements during the subsequent weeks, and before we knew it, the season was well underway. Victory!

Until…

‍About a month into the season, we got our AWS bill. To say that it caused us a fright would be an understatement. The bill was… HUGE! What the heck happened?!

‍## It’s coming from inside the house!

Because of our architecture, we knew that the biggest chunk of our bill was going to come from DynamoDB. But we had done a reasonable job of estimating that cost based on our DDB capacity limits. So why was the AWS bill so high?

‍It turns out that the culprit was our Redis clusters. In retrospect, it was predictable, but we had been so busy just trying to make sure that things were operational in time to meet our deadlines, we hadn’t had time to do the math.

‍To meet the demands of our peak traffic during the games, we had been forced to create clusters with 90 nodes in them—in every region that we were broadcasting from. Plus, we needed each node to have enough RAM to store all the data we were pumping into them, which required very large instance types.

‍## Is this place haunted?

Very large instance types that provided the amount of RAM we needed happened to also come with high numbers of vCPUs. Redis is a single-threaded application, meaning it can only take advantage of one vCPU on each node in the cluster, leaving the remaining vCPUs almost 100% idle.

‍So there we were, paying for boatloads of big 16-vCPU instances, and we were guaranteed each one of them would never be using more than about 6% of the CPU it had available. Believe it or not, this wasn’t even the worst of it.

‍The peak traffic we would experience during the sports broadcasts dwarfed the traffic we were handling during any other window of time. So not only were we forced to pay for horsepower that we weren’t even fully utilizing during the games, but we were paying for these Redis clusters 24 hours a day, seven days a week, even though they were effectively at 0% utilization outside of the 3-hour window each week when we were broadcasting the sporting events.

‍And then the season ended and we had no more sports broadcasts for 6 months. So now those clusters were sitting at approximately 0% utilization 24-7.

‍Okay, fine. Problem identified. All we had to do was fix it and get our cloud bill under control!

‍## A horde of zombie… engineers!

Well, it turns out fixing our spend on our Redis clusters was much easier said than done. The managed Redis service didn’t have any easy, safe way to scale the clusters up and down. And because Redis clients handle key sharding on the client side, they have to be aware of the list of available servers at any given time, meaning that scaling the cluster in or out carries a high risk of impacting cache hit rate during the transition, and thus would need to be managed very carefully.

‍These were solvable problems. Throw enough engineers at something, and anything is possible, right? They could update all of the code so that it writes to two different clusters during a scaling event and have reads fail over from the new cluster to the old one for cache misses during the transition. Then, they could scale down by adding a second, smaller Redis cluster alongside the giant one needed for peak traffic. They could definitely handle the work of meticulously monitoring the behavior of the new code while the new cluster was brought online, and they could decide when it’s safe to begin the teardown of the old cluster. Oh, and they can kick that off and meticulously monitor it to make sure that goes smoothly.

‍So sure, our team was capable of doing that twice a week: once when we needed to scale up in preparation for the sports broadcast, and again when we needed to scale down to save costs after the event.

‍But that would be a ton of work. Now we were forced to do some math on how much we were paying those engineers vs. how much we were paying for the overprovisioned Redis clusters.

‍And then there’s the opportunity cost: none of this cluster scaling nonsense had any unique business value for us, and we had a limited number of engineers available to work on delivering features actually unique to our business and provide actual customer-facing value to our users.

‍I bet you can guess where we landed. Yep. We never reached a point where we felt like we could justify the engineering cost it would take to try to solve this problem when there were so many more valuable customer projects our engineers could be doing—projects which would actually move the business forward and win us new customers.

‍So we just kept paying. For something we weren’t using.

‍At a certain point, if our business was struggling, we might have been forced to allocate the engineering resources to solving this problem in order to reduce our spending and balance the budget. But this would have been a sign that we were in trouble.

‍And I don’t know how you feel about the cloud services your team spends money on, but I consider it pretty scary that a cloud service can make it so complicated for you to get a fair bill—a bill where you are paying a fair amount for what you are actually using, and not paying a ton of money for resources that are sitting idle—that you will only be able to make time for it if you’ve gotten into a desperate situation.

‍It’s a great business model for the cloud service provider. Not a great business model for the customer.

‍It doesn’t have to be this way.

‍## Momento Cache: All treat, no tricks!

The horrific tale you’ve just read was a large part of the inspiration for us to build Momento’s serverless caching product. One of the best things about serverless cloud services is the fair pricing model: pay for what you use and nothing more. Why should we settle for less with caching?

‍With Momento, you get a dead-simple pricing policy based strictly on how many bytes you send to and receive from your cache. We don’t think you should have to pay more if those bytes are all transferred within a 3-hour window or are evenly distributed over the course of a week or a month. As far as we’re concerned, you should be able to read and write your cache when you need it. That’s it. Plain and simple.

‍Of course, serverless doesn’t stop there. We manage all of the tricky stuff on the backend for you. If your traffic increases and your cache needs more capacity, that’s on us. If your traffic decreases, you shouldn’t have to pay the same amount of money for your low-traffic window as you did for your high-traffic window. And you most certainly shouldn’t have to pay for 15 idle CPU cores on a bunch of nodes in a caching cluster just because you needed more RAM.

‍So: stop letting cloud services trick you into paying for caching capacity that you aren’t using, and see what a treat it is to work with Momento today! You can create a cache—for free—in less than five minutes. If it takes more than five minutes, let us know, and we’ll send you some Halloween candy.

‍Visit our getting started guide to try it out, and check out our pricing page to see how we make sure you get what you pay for.

‍Happy Halloween! 👻

Moving your bugs forward in time

Chris Price — Wed, 08 May 2024 21:14:50 +0000

Over the course of my years as a software engineer, I’ve slowly become more curmudgeonly deliberate about how to structure a codebase, and how to gauge its relative success.

In my early days I was myopically focused on what the code can do today. It was all about speed, and cranking out code as fast as possible. Tests were a nice-to-have. “Works on my machine” was a reasonable acceptance criteria. And I’m not sure I even knew the definition of the word “maintainable”.

Those times were great fun. And I considered myself to be a pretty 1337 coder.

Then I watched several codebases I’d worked on grind to an eventual halt because no one could understand them, or they were too hard to extend or debug, or they were so fragile that people were afraid to change anything about them lest they introduce a crazy bug that would explode after being deployed to production.

Old man yells at cloud

Now, I’m older and I’ve worked with a lot of different engineers on a lot of different codebases. Now, I have a very different opinion. Now, “maintainability” is one of the most important words in my vocabulary. Now, I care so much more than I did before about what the code will be able to do tomorrow. And perhaps most importantly, I care so much more about what $nextEngineer will be able to make this code do tomorrow than about what I myself might be able to do.

These are the things that allow your software to survive and thrive beyond the early days. The things that ensure that your business will be able to continue to grow and evolve at the same pace a year from now that it can today, that it won’t get bogged down by an un-maintainable, un-extensible code foundation that drags your engineering team’s velocity down towards zero.

The skills, experience and foresight that are required to ensure a maintainable codebase, to be a force multiplier that ensures that a breadth of current and future engineers will be able to achieve sustainable, high velocity working on the code—these are the traits that are at the top of my list when evaluating engineering candidates nowadays. It’s not how many lines of code you can produce nor how quickly—it’s how well those lines of code will hold up as the foundation for your business that grows and evolves over time.

Boiling it down

Occasionally I reflect back on this mindset transition and try to distill my thoughts on maintainability down into something concrete that I can try to communicate to other engineers. And if I had to choose one overarching theme—that I could boil down to a single sentence—that best captures the spirit of what I believe about maintainability these days, it would be this:

Structure your code so that you will catch your bugs at compile time, rather than at run time.

Move your bugs forward in time. There is no single thing that you can do that will have a more sustained impact on the medium-to-long-term velocity of your team than this.

‍For the rest of this post I’ll list off some more tactical examples of things that you can do towards this goal. Savvy readers will note that these are not novel ideas of my own, and in fact a lot of the things on this list are popular core features in modern languages such as Kotlin, Rust, and Clojure. Kotlin, in particular, has done an amazing job of emphasizing these best practices while still being an extremely practical and approachable language.

‍So, credit where it is due: the brilliant language designers of these and other languages deserve all of the kudos for bringing these ideas to the foreground of the software engineering zeitgeist. Today, I’m just here to sing their praises. 🙂

(Side note: spending some time writing code in a variety of new languages is a really amazing way to broaden your horizons and challenge your beliefs about software engineering best practices. You’ll find that it’s often much easier than you think to apply lessons learned from a foundational feature of one language to another language that doesn’t explicitly provide or emphasize that feature. I haven’t written Clojure in several years now, but I firmly believe that the time I spent writing in that language did more to improve my skills as a software engineer than anything else I’ve ever done.)

‍And now, without further ado, let’s get into it.

5 patterns and language features to help catch bugs earlier

1. Static types

This one can be a tough pill to swallow for folks who love Python, Ruby, Clojure, and other dynamically typed languages. And I might never convince some of you of this point. But this is a thing that has burned me enough times over the years that I don’t think I’ll ever change my mind on it.

‍Part of the allure of dynamically typed languages is that if you don’t have to spend time on all of the ceremony of defining types and declaring them on all of your method signatures, you can code more quickly and spend your time thinking about the business logic instead of the object model. And you can write more flexible, re-usable functions that operate on data rather than operating on types.

‍There’s some truth to those arguments, especially in the early prototyping phase of a project. But what I’ve repeatedly seen is that once a codebase in a dynamically typed language grows beyond a certain size, it becomes harder and harder to reason about it and maintain it. Over and over again I’ve seen cases where a well-meaning developer working on one part of the code passes the wrong object type to a function in another part of the code that they weren’t the original author of, and then when that function call occurs, the app crashes.

‍The worst part of this is that that error happens at run time. If you’ve already deployed the code to production without catching this bug, you may have a customer-facing outage on your hands, and now you have to go through a fire drill to rollback the change or push a hotfix. Depending on how bad it was, this may cost you customers. Even in the best case scenario it’s stressful—and it has a high opportunity cost, as you have to pull some of your engineering team off to fight the fire.

‍Some argue that if such a bug makes it through to production, that is a sign that you didn’t add enough test coverage to ensure that the function would only be called with the correct argument types. My response to that is: yes, if you are diligent enough about test coverage, and you don’t make any mistakes in the test code itself, you might be able to avoid shipping most bugs of this classification. But testing is an art form in and of itself, and every one of your engineers must achieve a certain level of proficiency at it in order to clear this bar. And even if they do, we all still make mistakes from time to time.

A compiled language with a static type system guarantees that you will avoid shipping this type of bug to production.

No ifs, ands, or buts about it. And it does not rely on the varying experience levels of your engineers - if they write some code that has a bug like this in it, it won’t compile, and it will never even make it into a PR. No matter how good or bad the test coverage is. Let’s offload this category of work to compilers instead of putting it on our engineers!

‍I’ve become more and more convinced of this over time, to the point where I won’t even advocate for dynamically typed languages for prototypes anymore. Prototypes very often end up being promoted to products, if for no other reason than the code is already written. But if you’re going to promote a prototype to a product then you really ought to have sufficient test coverage to make sure your product is reliable, and at that point you’ll probably have invested the same amount of engineering effort that you would have put into building the prototype in a statically typed language like Kotlin or Go in the first place.

‍I’m not here to tell people which languages they should love. But if you do find yourself writing production code in a dynamically typed language like Python, Ruby, or JavaScript, I would give serious consideration to opting into the type-checking tools that have become available in those ecosystems. In Python, consider requiring type hints and adding mypy checks to your CI to move your type safety bugs forward in time. For JavaScript, consider incrementally shifting to TypeScript. For Ruby, try out the RBS type annotation system that was added in Ruby 3.0.

2. Null safety

Now we’ll get into some (hopefully) less controversial territory. You’ve probably heard the line about null references being a billion-dollar mistake. And if you’ve worked in a language that doesn’t provide compile-time null safety, you’ve surely encountered your fair share of silly bugs resulting in null pointer exception crashes at run time, or code that is littered with boilerplate null checks at the beginning of every single function call, or both.

‍Thankfully the trend in modern languages is to help us move these bugs forward in time by giving us a way to declare variables that are not allowed to be null. This is a core feature of C#, Kotlin, and TypeScript, among others. In Java, you can use Optional instead of allowing null. So we can let the compilers do this work for us.

‍In general if you find yourself using nullable variables these days, it might be a code smell. See if there is a different way you can structure the code to avoid it, and if not, see if your language of choice has any mechanism for compile-time/build-time null safety tooling.

3. Immutable variables and data structures

This one takes some getting used to and may be hard to believe at first, but consider this: there are precious few places in your code where you actually need any of your variables or types to be mutable.

‍The first time someone told me this was when I was learning Clojure, where it was a matter of necessity because it’s very difficult to even express a mutable object. I found the idea quite implausible. But once I opened my mind to it and got a little practice, I realized that it was true.

‍Immutable variables are an incredibly powerful way to improve code maintainability. Here’s why: when you are an engineer working on a code base that you aren’t entirely familiar with, and you encounter a line of code that defines an immutable variable whose type is an immutable data structure, you know all that you will ever need to know about that variable after reading that one line of code. Because it is guaranteed not to be changed anywhere else in the program.

‍Contrast this with mutable variables and mutable data structures: after reading a line of code where one of these is instantiated, if I want to reason about the state of that variable 100 lines farther down in that same code, there are a ton of things I have to consider:

Were there any statements in between that might have modified it?
Was that object passed by reference to any functions that might have mutated it?
If so, do I need to go examine the source code of all of those functions to make sure I know what the state will be?
Does my program have more than one thread, and if so, do any other threads have a reference to this object that might have allowed them to mutate it while I was working with it?

There is so much hidden complexity that comes along with mutable state. If I have to do the amount of reasoning described above for every line of code that deals with a mutable object, and I can instead write the same program in a way that only uses immutable variables and data structures, the reduction in complexity is astounding. And the corresponding increase in engineering velocity and maintainability is as well.

‍Many languages have a way to define immutable local variables these days (e.g. Kotlin val, TypeScript const). Many also have a way to define immutable data structures (e.g. Kotlin data class, C# record struct). Lean into these where you can.

‍Most engineers I work with are sold on this idea fairly easily, except for when dealing with collections. We are so used to writing loops that build up arrays or maps, it’s really hard to get used to the idea that this can be done without a mutable data structure and without a loop. But it can! Almost all languages these days have some flavor of functional programming tools for operating on collections (map, filter, reduce/fold, etc.). These can take some getting used to but they are well worth the price of admission.

‍The reduce / fold operation in particular can be a bit of a learning curve but it is the key to eliminating the need for mutable collections in your code. It will allow you to re-write code that looks like this:

val pepperNames = listOf("jalapeno", "habanero", "serrano", "poblano")
val pepperNameLengths = mutableMapOf()
for (pepperName in pepperNames) {
    pepperNameLengths[pepperName] = pepperName.length
}
// from here forward we need to be cognizant about the pepperNameLengths map being mutated!‍

without the mutable map:

val pepperNameLengths: Map = pepperNames.fold(mapOf()) { accumulator, pepperName ->
    accumulator + (pepperName to pepperName.length)
}
// no mutable map to worry about here!

4. Persistent collections (aka immutable collections)

When a coworker originally told me that I should be using immutable collections, my instinct was that this was impractical due to performance concerns and memory usage. If I represent a map as an immutable collection, and then somewhere in my code I need to add or modify a key in it, doesn’t that mean copying the entire data structure in order to obtain the version that contains my modification? Isn’t that crazy expensive?

‍Well, it turns out: no. As long as you are using persistent collections.

‍I first encountered this concept in Clojure, and I highly recommend Rich Hickey’s fantastic talk on the topic. The tl;dr is that:

A persistent data structure is guaranteed to be immutable, but provides modifier functions (put, add, remove etc.) that will produce another persistent data structure with the same immutability guarantees.
Under the hood, these data structures are implemented as trees, and when you want to modify a single item, you can do so by creating a new tree that shares almost all of the nodes of the original tree. You only need to copy and replace the small set of nodes in the tree on the path to the item you are modifying. In efficient implementations, this means you almost never need to clone more than about 4 nodes in the tree even if it has millions of nodes. The rest can be shared, which is efficient in terms of both memory usage and performance.

‍Many languages now have “persistent collections” or “immutable collections” libraries (e.g. Java PCollections, C# Immutable Collections, etc.) that do all of the heavy lifting for you. You interact with them just like normal collections, but you get all of the benefits of immutability while still maintaining great performance.

‍This concept is amazingly powerful, especially in concurrent programs. It means that you can pass a reference to a collection around anywhere you like in your program, and many threads can consume it at the same time with no locking or any other concerns about the collection being modified by another thread. You’ll be amazed at how much simpler this can make some of your application code! And at how nice it is to stop needing to worry about lock contention.

5. Algebraic data types and exhaustive pattern matching

These go by many names in different languages. In Kotlin they are called sealed classes. In many languages this may end up being just a special flavor of polymorphism. I think of them as an enumeration of types, where you know at compile time all of the types in the enumeration, but each type in the enumeration can have its own discrete properties, methods, etc.

‍It’s easiest to explain via a specific example. I’ll use an example from the Momento Cache API, since that’s something I’ve been working on a lot lately.

‍When you make a call to the get method on a Momento client object to retrieve a value from your cache, the response may be one of three very different types:

A cache hit, in which case you will also get back the value that was retrieved from the cache
A cache miss, in which case there will be no cache value.
An error, if something went wrong with the request, in which case you might get an error code and an error message.

‍Without algebraic data types, a common way to try to represent this situation in code might be to provide a GetResponse object, with a status enum property that could be used to identify whether the response was a HIT, MISS, or ERROR. The object would also need fields to hold the various data that is relevant for each of those cases: e.g. value, errorCode, errorMessage. Those fields would have to be nullable or optional, because they would only be available conditionally, depending on which type of response we got. Something like this:

enum class GetResponseStatus {
    HIT, MISS, ERROR
}
data class GetResponse(
    val status: GetResponseStatus,
    val value: String?,
    val errorCode: Int?,
    val errorMessage: String?
)

This is not an awful way to define this API, but it has one big drawback: it is the developer’s responsibility to write code that checks all of the conditions correctly, and if there is a bug in the code it will only surface at run time. For example, if you write code that assumes the response was a HIT without checking, and you try to access the value property, you will get a null pointer exception at run time if the response was not actually a HIT. (In the Kotlin code snippet above, because of Kotlin’s null-safety rules, you’d be forced to write some code to deal with the possibility of those values being null, but in other languages that wouldn’t necessarily be the case. The point remains that it is the developer’s responsibility to reason about which of these fields might be null and when.)

Algebraic data types provide a much nicer way to specify this API, without exposing any nullable fields at all. Here’s how this might look using Kotlin’s sealed classes:

sealed interface GetResponse {
    data class Hit(val value: String) : GetResponse
    object Miss : GetResponse
    data class Error(val errorCode: Int, val errorMessage: String) : GetResponse
}

Now we have a discrete class for each of the three cases, and each of those three classes has only the properties that are relevant to it. And they are no longer nullable.

A developer would access the appropriate class via pattern matching. In Kotlin, this is done via the when expression:

val getResponse: GetResponse = cacheClient.get("myCacheKey")
when (getResponse) {
    is GetResponse.Hit -> {
        println("Cache hit! ${getResponse.value}")
    }
    GetResponse.Miss -> { 
        println("Cache miss!")
    }
    is GetResponse.Error -> {
        println("Error! ${getResponse.errorMessage}")
    }
}

This approach is really nice because it removes the burden of knowledge from the developer for questions like “in which cases will value be available?” The value property only exists on the Hit class, so we get compile-time enforcement that it can’t be accessed unless the result was a Hit. We have once again moved our bugs forward in time!

The other great thing about this approach is that, in languages like Kotlin, the pattern matching expression is exhaustive. This means that the compiler is smart enough to know whether you have handled all of the possible cases in your when expression, and fail to compile if you have not. Imagine a scenario where you have several of these when expressions scattered around a large code base, and an engineer is working on a new feature that involves adding an additional type of GetResponse to the sealed class. Without the exhaustive pattern matching, the engineer would be responsible for identifying every place in your code that is interacting with a GetResponse, and making sure that it appropriately handles the new type of response. Otherwise what do we end up with? A bug that isn’t exposed until run time.

‍But with exhaustive pattern matching, once the new type is added, the code won’t compile until we’ve updated all of the places in the code that need to be updated to account for it. Win!

Closing thoughts

The key to building a solid foundation for your software and sustaining high velocity for your engineering team for the life of your product is to make sure your codebase is maintainable. It’s crucially important that future engineers are able to ramp up on the code quickly and safely. Thankfully, trends in modern programming languages are giving us more and more tools to achieve that, and to move entire classes of bugs forward in time from run time to compile time. This also saves us a ton of engineering time that we don’t need to spend writing tests to prove that we haven’t introduced these classes of bugs. (Don’t get me wrong: tests are still very important! But it’s so nice not to have to write tests around the behavior of nullable properties or other such mundane things that aren’t actually related to your business.)

‍The strategies above have proved especially valuable for me in the projects that I’ve worked on in recent years. I hope you’ll find them valuable too!

‍If you have any other favorite approaches for moving bugs forward in time, I would love to hear about them. Send them my way on Twitter (@cprice404) or LinkedIn—or join the Momento Discord and start a discussion!

How to build a question answering system in Node.js with a vector index and OpenAI

Pratik Agarwal — Tue, 13 Feb 2024 15:15:03 +0000

In this step-by-step guide, we delve into building a question answering system from scratch, focusing on a specific topic: carrots. Central to our exploration is the concept of treating question answering as a retrieval process. This approach involves identifying source documents or specific sections within them that contain the answers to users' queries. By revealing the underlying process without the complexities introduced by external libraries, we aim to provide valuable insights into the fundamental workings of such systems. We'll be presenting this tutorial with code examples in TypeScript (Node.js).

Here's a quick overview of how we will get this done:

Initialize OpenAI and Momento clients.
Fetch and process (create chunks) carrot data from Wikipedia.
Generate embeddings for the text using OpenAI.
Store the embeddings in Momento Vector Index.
Search and respond to queries using the stored data.
Utilize OpenAI's chat completions for refined responses.

Environment Setup

Before we start coding, we need to create our index in Momento for storing data, and generate an API key to access Momento programmatically. You can do both on our console, and follow this guide for details! The code below uses mvi-openai-demo as the index name, 1536 for the number of dimensions (more on this soon!), and cosine similarity as the similarity metric. Cosine similarity cares more about the orientation of vectors than its magnitude (the word count in this case), which are suitable for a question answering system.

We also need an OpenAI API key to generate embeddings of our data and search queries.

Next, we have to install the necessary packages. For TypeScript, we use @gomomento/sdk and openai.

NodeJS:

npm install @gomomento/sdk openai

Step 1: Initializing Clients

We begin by initializing our OpenAI and Momento clients. Here, we set up our development environment with the necessary packages and API keys. This step is crucial for establishing communication with OpenAI and Momento services. It lays the foundation for our Q&A engine.

Make sure you have the environment variables 'OPENAI_API_KEY' and 'MOMENTO_API_KEY' set before you run the code!

import OpenAI from 'openai';
import * as https from 'https';
import { ALL_VECTOR_METADATA, CredentialProvider, PreviewVectorIndexClient, VectorIndexConfigurations, VectorSearch, VectorUpsertItemBatch } from '@gomomento/sdk';
import { CreateEmbeddingResponse } from "openai/resources";


const openai = new OpenAI({ apiKey: process.env['OPENAI_API_KEY'] });
const mviClient = new PreviewVectorIndexClient({ credentialProvider:  CredentialProvider.fromEnvironmentVariable({ environmentVariableName: 'MOMENTO_API_KEY' }), configuration: VectorIndexConfigurations.Laptop.latest() });
const indexName = 'mvi-openai-demo'

Step 2: Loading data from Wikipedia

We start by extracting data about carrots from Wikipedia. This step demonstrates how to handle external API calls and parse JSON responses. Go ahead and try this out locally for any Wikipedia page!

interface WikipediaResponse {
  query: {
    pages: {
      [key: string]: {
        extract: string;
      };
    };
  };
}

function getWikipediaExtract(url: string): Promise<string> {
  return new Promise((resolve, reject) => {
    https
      .get(url, response => {
        let data = '';

        response.on('data', chunk => {
          data += chunk;
        });

        response.on('end', () => {
          try {
            const jsonData: WikipediaResponse = JSON.parse(data);
            const pages = jsonData.query.pages;
            const extract = Object.values(pages)[0].extract;
            resolve(extract);
          } catch (error) {
            reject(error);
          }
        });
      })
      .on('error', error => {
        reject(error);
      });
  });
}

const url = "https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Carrot&prop=extracts&explaintext";

Now let’s run these snippets and view the length of our Carrot wikipedia page with a sample text.

const extractText = await getWikipediaExtract(url);
console.log('Total characters in carrot Wikipedia page: ' + String(extractText.length));
console.log('Sample text in carrot Wikipedia page:\n\n ' + extract_text.substring(0, 500))

Output:

Total characters in carrot Wikipedia page: 21534 The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten.

Go ahead and try this out locally for any Wikipedia page!

Step 3: Preprocessing data to create chunks

In building our Q&A engine, we approach question answering as a kind of retrieval: identifying which source documents (or parts of them) contain the answers to a user's query. This concept is fundamental to our process and influences how we handle our data.

To make our system effective, we preprocess the data into chunks. This is because, in a question-answering context, answers often reside in specific sections of a document rather than across the entire text. By splitting the data into manageable chunks, we're effectively creating smaller, searchable units that our system can scan to find relevant answers. This chunking process is a crucial step in transforming extensive text into a format conducive to semantic search and retrieval.

We've opted for a straightforward approach to split our text by character count. However, it's crucial to understand that the size and method of chunking can significantly impact the system's effectiveness. Too large chunks might dilute the relevance of search results, while too small ones may miss critical context.

Alternative chunking methods may use tokenizers such as tiktoken to split the text along boundaries that align with the text embedding model. These methods may produce better results, but require external libraries. For demonstration we opt for a simpler method.

function splitTextIntoChunks(text: string, chunkSize = 600): string[] {
  const chunks = [];
  for (let i = 0; i < text.length; i += chunkSize) {
    chunks.push(text.substring(i, i + chunkSize));
  }
  return chunks;
}

const chunks = splitTextIntoChunks(extractText);

Now we can view the total number of chunks that got created

console.log('Total number of chunks created: ' + String(chunks.length));
console.log('Total characters in each chunk: ' + String(chunks[0].length));

Output:

Total number of chunks created: 36
Total characters in each chunk: 600

Step 4: Generating Embeddings with OpenAI

In our approach to building a Q&A engine, we've chosen to leverage the power of vector search, a state-of-the-art technique in semantic search. This method differs significantly from traditional keyword search approaches, like those used in Elasticsearch or Lucene. Vector search delves deeper into the intricacies of language, capturing concepts and meanings in a way that keyword search can't.

To facilitate vector search, our first task is to transform our textual data into a format that embodies this richer semantic understanding. We achieve this by generating embeddings using OpenAI's text-embedding-ada-002 model. This model is known for striking a balance between accuracy, cost, and speed, making it an ideal choice for generating text embeddings.

async function generate_embeddings(chunks: string[]) {
  return await openai.embeddings.create({input: chunks, model: 'text-embedding-ada-002'});
}

Recall that we selected 1536 as the dimensionality for our vector index. This decision was based on the fact that OpenAI, when generating embeddings for each chunk, produces these embeddings as floating point vectors with a length of 1536.

const embeddingsResponse = await generateEmbeddings(chunks);


console.log('Length of each embedding: ' + String(embeddingsResponse.data[0].embedding.length));
console.log('Sample embedding: ' + String(embeddingsResponse.data[0].embedding.slice(0, 10)));

Output:

Length of each embedding: 1536

Sample embedding: 0.008307404,-0.03437371,0.00043777542,-0.01768263,-0.010926112,-0.0056728064,-0.0025742147,-0.023453956,-0.021114917,-0.020148791

Step 5: Storing Data in Momento Vector Index

After generating embeddings, we store them in Momento's Vector Index. This involves creating items with IDs, vectors, and metadata, then upserting them to MVI. When storing data in the Momento Vector Index, it's important to use deterministic chunk IDs. This ensures that the same data isn't re-indexed repeatedly; optimizing storage, retrieval efficiency, and response accuracy. Managing data storage effectively is key to maintaining a scalable and responsive Q&A system.

async function upsertToMomentoVectorIndex(embeddingsResponse: CreateEmbeddingResponse, chunks: string[]) {
  const embeddings = embeddingsResponse.data.map(embedding => embedding.embedding);

  // Generate IDs for each chunk
  const ids = chunks.map((_, index) => `chunk${index + 1}`);

  // Generate metadata for each chunk. This will be needed when we search.
  const metadatas = chunks.map(chunk => ({text: chunk}));

  // Create VectorIndexItem objects
  const items = ids.map((id, index) => {
    return {
      id: id,
      vector: embeddings[index],
      metadata: metadatas[index],
    };
  });

  // Upsert to Momento Vector Index
  try {
    const upsertResponse = await mviClient.upsertItemBatch(indexName, items);
    if (upsertResponse instanceof VectorUpsertItemBatch.Success) {
      console.log('\n\nUpsert successful. Items have been stored');
    } else if (upsertResponse instanceof VectorUpsertItemBatch.Error) {
      console.error('Upsert error:', upsertResponse.message);
    }
  } catch (error) {
    console.error('Unexpected error during upsert:', error);
  }
}

await upsertToMomentoVectorIndex(embeddingsResponse, chunks);

Output:

Upsert successful. Items have been stored.

Step 6: Searching and Responding to Queries

This step highlights the core functionality of the Q&A engine - retrieving answers using Momento Vector Index.This process involves searching through the indexed data using text embeddings, a technique that ensures we find the most relevant and contextually appropriate results.

When we indexed snippets of text in the previous steps, we first transformed these text snippets into vector representations using OpenAI's model. This transformation was key to preparing our data for efficient storage and retrieval in the Momento Vector Index.

Now, as we turn to the task of querying, it's crucial to apply a similar preprocessing step. The user's question, "What is a carrot?" in this instance, must also be converted into a vector. This enables us to perform a vector-to-vector search within our index.

The effectiveness of our search hinges on the consistency of preprocessing. The same embedding model and process used during indexing must be applied to the query. This ensures that the vector representation of the query aligns with the vectors stored in our index, otherwise the approach would not work.

async function searchQuery(queryText: string): Promise<string[]> {
  const queryResponse = await openai.embeddings.create({ input: queryText, model: 'text-embedding-ada-002' });
  const queryVector = queryResponse.data[0].embedding;

  try {
    const searchResponse = await mviClient.search(indexName, queryVector, {
      topK: 2,
      metadataFields: ALL_VECTOR_METADATA
    });

    if (searchResponse instanceof VectorSearch.Success) {
      const texts: string[] = searchResponse.hits().map(hit => hit.metadata.text as string);
      return texts;
    } else if (searchResponse instanceof VectorSearch.Error) {
      console.error('Search error:', searchResponse.message());
    }
  } catch (error) {
    console.error('Unexpected error during search:', error);
  }

  return [];
}

Let’s start with a simple search for “What is a carrot?”:

const query = 'What is a carrot?';
const texts = await searchQuery(query);
if (texts.length > 0) {
  console.log('\n=========================================\n');
  console.log('Embedding search results:\n\n', texts.join('\n'));
  console.log('\n=========================================\n');
}

The output for this query looks like:

The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten. The domestic carrot has been selectively bred for its enlarged, more palatable, less woody-textured taproot. The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability

As you see, we indexed vectors in Momento Vector Index and stored the original text as metadata in the items. When asked the question “What is a carrot?”, we transformed the text into a vector, performed a vector search in MVI, and returned the original text stored in the metadata. Under the hood we did a vector-to-vector matching, yet from a user perspective it looks like a text-to-text search.

Step 7: Too verbose? Let’s use chat completions to enhance query responses

Until now, our approach has treated question answering primarily as a retrieval task. We've taken the user's query, performed a search, and presented snippets of information that could potentially contain the answer. This method, while effective in fetching relevant data, still places the onus on the user to sift through the results and extract the answer. It's akin to providing pages from a reference book without pinpointing the exact information sought.

To elevate the user experience from mere retrieval to direct answer generation, we introduce Large Language Models (LLMs) like OpenAI's GPT-3.5. LLMs have the ability to not just find but also synthesize information, offering concise and contextually relevant answers. This is a significant leap from delivering a page of search results to providing a clear, succinct response to the user's query.

async function searchWithChatCompletion(texts: string[], queryText: string) {
  const text = texts.join('\n');
  const prompt: string = 'Given the following extracted parts about carrot, answer questions pertaining to' +
    " carrot only from the provided text. If you don't know the answer, just say that " +
    "you don't know. Don't try to make up an answer. Do not answer anything outside of the context given. " +
    "Your job is to only answer about carrots, and only from the text below. If you don't know the answer, just " +
    "say that you don't know. Here's the text:\n\n----------------\n\n";

  const resp = await openai.chat.completions.create({
    messages: [
      { role: 'system', content: prompt + text },
      { role: 'user', content: queryText },
    ],
    model: 'gpt-3.5-turbo',
  });

  return resp;

}

And let’s use the same query “What is a carrot?” to compare the response.

const chatCompletionResp = await searchWithChatCompletion(texts, query);

console.log('\n=========================================\n');
console.log('Chat completion search results:\n\n', chatCompletionResp.choices[0].message.content);
console.log('\n=========================================\n');

Output:

A carrot is a root vegetable that is typically orange in color, although there are also other colored variants such as purple, black, red, white, and yellow. |

Now let’s quickly compare the outputs of a more specific question such as "How fast do fast-growing cultivators mature in carrots?"

const query = 'how fast do fast-growing cultivators mature in carrots?';
const texts = await searchQuery(query);
  if (texts.length > 0) {
    console.log('\n=========================================\n');
    console.log('Embedding search results:\n\n', texts[0]);
    console.log('\n=========================================\n');

    const chatCompletionResp = await searchWithChatCompletion(texts, query);

    console.log('\n=========================================\n');
    console.log('Chat completion search results:\n\n', chatCompletionResp.choices[0].message.content);
    console.log('\n=========================================\n');
  }

Output:

Notice how brief and precise the chat completion response is compared to the raw semantic search results.

Embedding search results: The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability.

Chat completion search results: Fast-growing cultivars of carrots mature within about three months (90 days) of sowing the seed.

Conclusion

In this guide, we embarked on a journey to build a question answering system from the ground up. The key idea behind our approach was to treat question answering as a retrieval problem. By using text embeddings and vector search, we've brought in state of the art nuanced and semantically rich search, surpassing traditional keyword-based approaches. Let's briefly recap the steps we took to get here:

Initializing Clients: Set up OpenAI and Momento clients, laying the groundwork for our system.
Fetching and Processing Data: Extracted and processed data from Wikipedia, preparing it for embedding generation. We learnt about the significance of creating chunks of data for efficient retrieval.
Generating Embeddings: Utilized OpenAI's text-embedding-ada-002 model to generate text embeddings, converting our corpus into a format suitable for semantic search. We learnt how the length of these embeddings direct the number of dimensions of a vector index.
Storing in MVI: Stored these embeddings in Momento's Vector Index, ensuring efficient retrieval. We learnt about a common pitfall of using UUID as an index’s item ID, which results in repeated re-indexing of the same data.
Searching and Responding to Queries: Implemented a search functionality that leverages vector indexing for semantic search to find the most relevant responses. We perform a vector-to-vector search, and use the text stored in the metadata of our items to display to the user.
Enhancing Responses with Chat Completions: Added a layer of refinement using OpenAI's chat completions to generate concise and accurate answers. Here we witnessed that Large Language Models not only improve the accuracy of the responses but also ensure they are contextually relevant, coherent, and presented in a user-friendly format.

Finally, while our hands-on approach offers a deep dive into the mechanics of building a Q&A engine, we recognize the complexity involved in such an endeavor. Frameworks like Langchain abstract much of this complexity, providing a higher-level abstraction that simplifies the process of chaining embeddings from OpenAI or altering the vector store. Langchain is a choice tool for many developers, making it easier to build, modify, and maintain complex AI-driven applications.

How to build a question answering system in Python with a vector index and OpenAI

Pratik Agarwal — Tue, 13 Feb 2024 15:02:18 +0000

Here's a quick overview of how we will get this done:

Initialize OpenAI and Momento clients.
Fetch and process (create chunks) carrot data from Wikipedia.
Generate embeddings for the text using OpenAI.
Store the embeddings in Momento Vector Index.
Search and respond to queries using the stored data.
Utilize OpenAI's chat completions for refined responses.

We also have a Google Colab set up for this blog, where you can execute queries while you're reading the blog!

Environment Setup

We also need an OpenAI API key to generate embeddings of our data and search queries.

Next, we have to install the necessary packages. For Python, you'll need openai, requests, and momento.

pip install momento openai

Step 1: Initializing Clients

Make sure you have the environment variables 'OPENAI_API_KEY' and 'MOMENTO_API_KEY' set before you run the code!


import openai
import requests
from momento import CredentialProvider, PreviewVectorIndexClient, VectorIndexConfigurations
from momento.config import VectorIndexConfiguration
from momento.requests.vector_index import Item
from momento.responses.vector_index import Search, UpsertItemBatch
import os

# Setting up the API keys and clients
openai.api_key = os.environ['OPENAI_API_KEY']
VECTOR_INDEX_CONFIGURATION: VectorIndexConfiguration = VectorIndexConfigurations.Default.latest()
VECTOR_AUTH_PROVIDER = CredentialProvider.from_environment_variable('MOMENTO_API_KEY')
mvi_client = PreviewVectorIndexClient(VECTOR_INDEX_CONFIGURATION, VECTOR_AUTH_PROVIDER)
index_name = 'mvi-openai-demo'

Step 2: Loading data from Wikipedia

We start by extracting data about carrots from Wikipedia. This step demonstrates how to handle external API calls and parse JSON responses. Go ahead and try this out locally for any Wikipedia page!

def get_wikipedia_extract(url: str) -> str:
    response = requests.get(url)
    data = response.json()
    pages = data['query']['pages']
    extract = next(iter(pages.values()))['extract']
    return extract

url = "https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Carrot&prop=extracts&explaintext";

Now let’s run these snippets and view the length of our Carrot wikipedia page with a sample text.

extract_text = get_wikipedia_extract(url)
print('Total characters in carrot Wikipedia page: ' + str(len(extract_text)))
print('Sample text in carrot Wikipedia page:\n\n ' + extract_text[0:500])

Output:

Total characters in carrot Wikipedia page: 21534

Sample text in carrot Wikipedia page:

The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten.

Go ahead and try this out locally for any Wikipedia page!

Step 3: Preprocessing data to create chunks

Alternative chunking methods may use tokenizers, such as tiktoken to split the text along boundaries that align with the text embedding model. These methods may produce better results, but require external libraries. For demonstration we opt for a simpler method.

def split_text_into_chunks(text: str, chunk_size: int = 600) -> list[str]:
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

chunks = split_text_into_chunks(extract_text)

Now we can view the total number of chunks that got created

print('Total number of chunks created: ' + str(len(chunks)))
print('Total characters in each chunk: ' + str(len(chunks[0])))

Output:

Total number of chunks created: 36 Total characters in each chunk: 600

Step 4: Generating Embeddings with OpenAI

def generate_embeddings(chunks: list[str]):
    response = openai.embeddings.create(input=chunks, model="text-embedding-ada-002")
    return response.data

embeddings_response = generate_embeddings(chunks)
print('Length of each embedding: ' + str(len(embeddings_response[0].embedding)))
print('Sample embedding: ' + str(embeddings_response[0].embedding[0:10]))

Output:

Length of each embedding: 1536
Sample embedding: 0.008307404,-0.03437371,0.00043777542,-0.01768263,-0.010926112,-0.0056728064,-0.0025742147,-0.023453956,-0.021114917,-0.020148791

Step 5: Storing Data in Momento Vector Index

def upsert_to_mvi(embeddings: list, chunks: list[str]):

    metadatas = [{"text": chunk} for chunk in chunks]

    ids = [f"chunk{i + 1}" for i, _ in enumerate(embeddings)]

    items = [Item(id=id, vector=embedding.embedding, metadata=metadata) for id, embedding, metadata in zip(ids, embeddings, metadatas)]

    response = mvi_client.upsert_item_batch(index_name, items)

    if (isinstance(response, UpsertItemBatch.Success)):
        print("\n\nUpsert successful. Items have been stored.")
    elif isinstance(response, UpsertItemBatch.Error):
        print(response.message)
        raise response.inner_exception

upsert_to_mvi(embeddings_response, chunks)

Output:

Upsert successful. Items have been stored.

Step 6: Searching and Responding to Queries

def search_query(query_text: str) -> list[str]:
    query_vector = openai.embeddings.create(input=query_text, model="text-embedding-ada-002").data[0].embedding
    search_response = mvi_client.search(index_name, query_vector=query_vector, top_k=2, metadata_fields=["text"])
    if isinstance(search_response, Search.Success):
        return [hit.metadata['text'] for hit in search_response.hits]
    elif isinstance(search_response, Search.Error):
        print(f"Error while searching on index {index_name}: {search_response.message}")
        return []

Let’s start with a simple search for “What is a carrot?”:

query = "What is a carrot?"
texts = search_query(query, index_name)
if texts:
    print("\n=========================================\n”)
    print("Embedding search results:\n\n" + "\n".join(texts))
    print("\n=========================================\n")

The output for this query looks like:

The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten. The domestic carrot has been selectively bred for its enlarged, more palatable, less woody-textured taproot. The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability.

Step 7: Too verbose? Let’s use chat completions to enhance query responses

def search_with_chat_completion(texts: list[str], query_text: str):
    text = "\n".join(texts)
    prompt = ("Given the following extracted parts about carrot, answer questions pertaining to"
              " carrot only from the provided text. If you don't know the answer, just say that "
              "you don't know. Don't try to make up an answer. Do not answer anything outside of the context given. "
              "Your job is to only answer about carrots, and only from the text below. If you don't know the answer, just "
              "say that you don't know. Here's the text:\n\n----------------\n\n")
    chat_response = openai.chat.completions.create(model="gpt-3.5-turbo", messages=[
        {"role": "system", "content": prompt + text},
        {"role": "user", "content": query_text}
    ])
    return chat_response.choices[0].message

And let’s use the same query “What is a carrot?” to compare the response.

chat_completion_resp = search_with_chat_completion(texts, query)
print("\n=========================================\n")
print("Chat completion search results:\n\n" + chat_completion_resp.content)
print("\n=========================================\n")

Output:

A carrot is a root vegetable that is typically orange in color, although there are also other colored variants such as purple, black, red, white, and yellow.

Now let’s quickly compare the outputs of a more specific question such as "How fast do fast-growing cultivators mature in carrots?"

query = "how fast do fast-growing cultivators mature in carrots?"
texts = search_query(query)
if texts:
    print("\n=========================================\n")
    print("Embedding search results:\n\n" + texts[0])
    print("\n=========================================\n")

    chat_completion_resp = search_with_chat_completion(texts, query)
    print("\n=========================================\n")
    print("Chat completion search results:\n\n" + chat_completion_resp.content)
    print("\n=========================================\n")

Output:

Notice how brief and precise the chat completion response is compared to the raw semantic search results.

Embedding search results:

The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability.

Chat completion search results:

Fast-growing cultivars of carrots mature within about three months (90 days) of sowing the seed.

Conclusion

Initializing Clients: Set up OpenAI and Momento clients, laying the groundwork for our system.
Fetching and Processing Data: Extracted and processed data from Wikipedia, preparing it for embedding generation. We learnt about the significance of creating chunks of data for efficient retrieval.
Generating Embeddings: Utilized OpenAI's text-embedding-ada-002 model to generate text embeddings, converting our corpus into a format suitable for semantic search. We learnt how the length of these embeddings direct the number of dimensions of a vector index.
Storing in MVI: Stored these embeddings in Momento's Vector Index, ensuring efficient retrieval. We learnt about a common pitfall of using UUID as an index’s item ID, which results in repeated re-indexing of the same data.
Searching and Responding to Queries: Implemented a search functionality that leverages vector indexing for semantic search to find the most relevant responses. We perform a vector-to-vector search, and use the text stored in the metadata of our items to display to the user.
Enhancing Responses with Chat Completions: Added a layer of refinement using OpenAI's chat completions to generate concise and accurate answers. Here we witnessed that Large Language Models not only improve the accuracy of the responses but also ensure they are contextually relevant, coherent, and presented in a user-friendly format.

How to create a Slack workflow with webhooks in Momento Topics

Allen Helton — Thu, 01 Feb 2024 21:43:29 +0000

Have you ever heard of Slack? It’s this niche, up-and-coming communication platform that’s really starting to gain traction in the workplace. 😉‍

Okay, but really, Slack is everywhere. We all know it. Many organizations are even dropping email in favor of Slack—for internal and external comms. But the real power of Slack isn’t in channels, threads, or DMs—it’s their Workflow Builder. This tool turns Slack into an integration machine, letting you tackle almost any task with just some minor configuration and API connections.‍

In this blog post, we'll show how simple it is to use webhooks in Momento Topics to create a seamless Slack workflow that sends a message whenever an event is published to a topic. With that in place, you can easily extend the workflow in any number of ways—from task automation to alerts to chat monitoring. Let’s dive in.‍

How to build a Slack workflow

First, select the Slack workspace in which you want the workflow to live. Then, navigate to the Workflow Builder.

Click on Create Workflow.

Select From a webhook.

Click on the Set Up Variables button and add variables for token_id and text.

At the bottom of this window, copy the URL from the Web request URL field and save it for later.

Type “send” in the search bar and click on Send a message to a channel.

Click the pencil icon to configure the message that gets sent to Slack. We’ll use the variables we set up earlier to show who sent the message and what they said. When you’re done, hit the Save button.

On the main workflow screen, click the Publish button and give your workflow a name and meaningful description. Hit Next to publish your workflow.

Create a webhook in Momento Console

Navigate to Momento Console and create a cache if you don’t already have one.

Navigate to your new cache and click on the Webhooks menu option on the left.

Click the Create Webhook button and fill out the form by providing a name for your webhook, which topic it should trigger on, and the value you copied from the Slack workflow. Then hit Create.

Now that we have our webhook created and our workflow published, let’s test it out! Go to the Topics page in Momento Console and select the cache and topic name we just configured, then hit Subscribe.

In the “Publish messages” section, type in your message and hit Publish.

Zip on over to Slack and check out your new message!

Add username to the messages

Now you might notice there is no username displaying to the left of the colon. That’s because we published a message without using an auth token with an embedded identifier. In a real world scenario, all our messages would uniquely identify a user. So let’s try that out.‍

I wrote this small script using the Momento JavaScript SDK to create a short-lived token scoped to publish only to our slack-bot topic. This token has my username embedded in it so it should show up in our Slack channel when we publish a message.


const { AuthClient, Configurations, CredentialProvider, TopicClient, ExpiresIn } = require('@gomomento/sdk');
const dotenv = require('dotenv');
dotenv.config();

const USERNAME = 'allen';
const MESSAGE = 'It worked - yeehaw!';

const publishMessage = async () => {
  const authClient = new AuthClient({
    credentialProvider: CredentialProvider.fromEnvironmentVariable({ environmentVariableName: 'MOMENTO_API_KEY' })
  });

  const tokenScope = {
    permissions: [
      {
        role: 'publishonly',
        cache: 'slack',
        topic: 'slack-bot'
      }
    ]
  };

  const token = await authClient.generateDisposableToken(tokenScope, ExpiresIn.minutes(1), { tokenId: USERNAME });

  const topicClient = new TopicClient({
    configuration: Configurations.Laptop.v1(),
    credentialProvider: CredentialProvider.fromString({ authToken: token.authToken })
  });

  topicClient.publish('slack', 'slack-bot', MESSAGE);
};

publishMessage();

After I run this script, I get a message in Slack with my username and message!

Super cool and super easy! After you do it once, this integration only takes a few minutes to set up. Plus, Slack workflows are incredibly easy to extend, so you could even set it up to send a message back over a topic! More on that in another post. 🙂‍

You can use this integration for alerts, status updates, monitoring chat sessions, and much, much more. We’re super excited about webhook functionality in Momento Topics and further enabling you to build the best developer experience for your users and development teams!‍

So what are you waiting for? Go out there and build!‍

Happy coding!

Momento - A Front-end Developer’s Best Kept Secret 🤫

Allen Helton — Tue, 19 Sep 2023 15:59:04 +0000

I’ve never been much of a front-end developer. Throughout my career my focus has been on the backend. I like designing systems, building data models, and choreographing event-driven workflows. It’s a satisfying set of problems that have challenged me since I graduated college.

But when I started working at Momento early in 2023, things changed. I started building more user facing demos that required web pages that looked better than the unstyled divs I was used to working with. So naturally, as any developer would, I asked ChatGPT to teach me how to build apps in Next.js. I picked up a bit of React and got the hang of front-end development over the course of a few months. Then I realized something - front-end development is just as hard as the backend!‍

The challenges are different, of course, and often lead to “slower than I would like” progress as I struggle through building something. But when Momento released the Web SDK, it was like someone gave me an easy button. And no, I’m not just saying that because it’s my job. It’s seriously way easier. Let me explain.‍

Backend capabilities without building a backend

Ok, that section header sounds like a lie. But it’s not! When you use the Momento Web SDK in your React, Angular, Vue, Astro, whatever UI framework you like, you get access to managed, serverless services without having to build them. You get access to a cache and managed WebSockets with nothing more than simple API calls. ‍

This means you get direct access to database-like features that let you cache API calls, build user-session stores, and save files temporarily in a remote location when you use Momento cache. Many front-end frameworks allow you to do these things with session and local storage, but they’re tied to a browser. With Momento, the constraints are cut loose. You get all these abilities cross-browser sessions and even cross-machine.‍

Momento acts as a personal high-speed web server for you to store temporary data and manage your WebSocket connections.‍

Charged only for data transfer into and out of Momento with a 5GB monthly free tier, this is a no-brainer for any front-end developer. ‍

Session data

Instead of checking and validating local storage TTLs for your user data, just save it into Momento cache and handle defaults when you get a cache miss.

const response = await cacheClient.dictionaryGetField('user-sessions', userId, 'fullName');
if(response instanceof CacheDictionaryGetField.Hit) {
  return response.value();
} else { // load from API
  const data = await axios.fetch(`/api/users/${userId}`);
  const user = await data.json();
  cacheClient.dictionarySetFields('user-sessions', userId, userData);  
  return user.fullName;
}

You can store an entire JSON object in a dictionary and fetch as much of or as little of the information as you want, like a GraphQL API (almost). This can be insanely powerful as you load information about a user. You can continue to add fields to this dictionary cache item and fetch details out of it from any machine, allowing other users in your system to gain access to the data in milliseconds.‍

If you’ve ever visited Momento’s booth at a conference, the games we run use a similar pattern. In fact, it uses the cache as the only data store, there’s no database at all. Check out the source code.‍

Sharing files

Have you ever needed to securely share a file with someone but didn’t know how to build a web server, setup https, and configure a storage mechanism? Me too (assuming you said yes). Well good news, even that has become only a few lines of code.

const buffer = fs.readFileSync(filePath);
await client.set('files', fileName, new Uint8Array(buffer.buffer), { ttl: 600 });

With the image set in the cache, you can create a token scoped explicitly to that one cache key, granting temporary access to it for wherever holds the token.

const scope = {
  permissions: [        
    {
      role: 'readonly',
      cache: 'files',
      item: {
        key: fileName
      }
    }
  ]
};

const token = await authClient.generateDisposableToken(scope, ExpiresIn.minutes(10));
return token.authToken;

In this example. the file is only available for 10 minutes so you don’t have to worry about your secure data being stale or left unknowingly on a server somewhere. With automatic time to live (TTL), your files are removed with no additional work.‍

For a full example file sharing app, you can check out our reference application.‍

Chat

You can build an entire chat app using only a front-end framework and Momento! Store the users and messages in cache items and notify people when a message is sent by publishing to Momento Topics. No lengthy auth mechanisms, connection management, or databases necessary. If you want to create short-lived, session-based chats, this is the perfect solution.‍

When messages are sent to a chat room, you can store them in a list cache item. This type of cache item will store things chronologically, allowing you to keep a thread-safe history of all messages in the order they were sent.

const msg = JSON.stringify({ username: name, message: message });    
cacheClient.listPushFront('chat', router.query.room, msg);

When people enter or leave a room, add them to a set cache item. This item type saves an unordered array of distinct items. This means if one user is having issues with network connectivity and keeps leaving and rejoining, it’s ok - the set will deduplicate them so you don’t have to.

cacheClient.setAddElement('chat', `${router.query.room}-users`, username);

After somebody has typed in their chat message and hits the “Enter” button, you can publish a message to everyone in the chat room in a single call, notifying them instantly of the new message.

const msg = { username, message, timestamp: new Date().toISOString() }; 
topicClient.publish('chat', router.query.room, JSON.stringify(msg));

Chat rooms come in many flavors. Check out our blog post on a full build we did in Vercel with accompanying source code.‍

Interactive sites

Nothing is more whimsical than loading up a webpage and seeing a mouse zoom across the screen that doesn’t belong to you. Collaborative platforms are all the rage and they can’t be done without the support of WebSockets. But WebSockets are hard to build and they require a server-side component. Not with Momento.
‍
Using Momento Topics, you can connect front-end to front-end, backend to front-end, and even backend to backend. There’s no restrictions on what can produce and consume events. This means you can send reactions, build collaboration sessions, and gamify your apps without server-side components!‍

Simply publish to a topic to send a message and subscribe to register an event handler when an event rolls in. We’ve already seen how easy it is to publish a message in the chat example. Subscribing is as easy as setting up an event handler when a message is received and when an error occurs:

 useEffect(() => {
  async function subscribeForReactions() {     
    const sub = await topicClient.subscribe('reactions', name, {
      onItem: (message) => { sendReaction(message.valueString()) },
      onError: (err) => { console.error(err) }
    });
    setSubscription(sub);
   }

   subscribeForReactions();
   return () => {
     subscription?.unsubscribe();
   }
}, []);

The event handler is completely up to you, meaning you can do anything you like in response to a message! For a practical example, you can check out my live reaction app where this code was pulled from.‍

Conclusion

There are tons of cool things that you can do in the front-end. Why spend time re-inventing the wheel and building your own caching mechanism or yet another WebSocket implementation? Instead, use a service that does it all for you and gives you access to an autoscaling, ultra-low latency web server directly from your browser (that’s secure, to boot)!‍

I’ve found this to not only be a significant time saver but also a huge total cost of ownership reduction as well. There’s no back-and-forth with the backend teams waiting for an API to be built. There’s no waiting for infrastructure to be stood up to try out something new. There’s no scheduled downtime. Our teams maintain significantly fewer moving parts. It’s serverless services in the front-end.‍

Quick mention on security. Remember, don’t pass long-lived API keys to the browser. Opt instead for tokens - short-lived, limited-scope values perfect for the browser.‍

Getting started with Momento is free! Sign in with your Gmail or GitHub account and get going in seconds! We’d love to see what you build! Hop on over to our Discord and say hello, the team is there ready to answer any questions and admire your projects.

Happy coding!

API keys vs tokens - what’s the difference?

Allen Helton — Thu, 07 Sep 2023 21:16:52 +0000

They say the two hardest problems in computer science are cache invalidation and naming things. Honestly, that’s not wrong. Those are super hard. ‍

What makes naming things difficult is being clear yet concise. There should be no doubt about the meaning of a variable, term, function, or class. If you think a term could mean one of two things, it’s not named correctly. ‍

Such is the case with API keys and tokens. I was having a discussion the other day where the two words were being thrown around interchangeably. About two minutes in, I had to stop the conversation and say “you know those are different, right?”‍

Apparently they did not know. As it turns out, many people can’t tell me the difference between an API key and a token. So let’s set the record straight.

‍

Definition

We can differentiate between an API key and a token with the following definitions:

API key - A value provided by code when calling an API to identify and authorize the caller. It is intended to be used programmatically and is often a long string of letters and numbers.
Token - A piece of data that represents a user session or specific privileges. Used by individual users for a limited period of time.

Generation

The method of creation is typically different between the two as well.

API key - Created one time, often through a user interface, and remains static until rotated. These can optionally be configured to expire after a certain amount of time.
Token - Generated dynamically on successful authentication or login event. Often has a short expiration time but is able to be refreshed for longer periods.

Scope

It wouldn’t be a discussion about auth without talking about permission scope. By permission scope, I mean the authorization portion or what functionality can be performed when using the provided auth method.

API key - Fixed, unchanging set of permissions to app capabilities. Whoever has the key can access the allowed resources.
Token - Limited to specific data or capabilities an individual has access to. This can be affected by roles or other business-level requirements. Tends to be more focused on data restriction.

Security

How secure is each method? If the key or token is compromised or acquired by a malicious user, how bad is the potential damage?

API key - Since these are generally long-lived and do not limit access to data, these can be devastating if compromised. They require the key to be revoked as the only means of resolution. Applications often need to have good observability to identify compromised keys and finding the malicious user.
Token - Designed with security in mind. Generally short-lived and easily revoked. A compromised token will only have scope of the data the user has access to and will expire automatically.

Use cases

So, when would you use one over the other? It looks like they have a good balance of pros and cons.

API key - Use for server-to-server communications, accessing public data like a weather API, integrating with 3rd party systems.
Token - Use for user authentication, fine-grained access control (FGAC), granting temporary access to resources, browser access, and managing user sessions.

Examples

Now that we understand the difference between the two, let’s look at two practical examples using the Momento JavaScript SDK.

API Keys

I did say that API keys are generally issued via a user interface. With that in mind, I don’t have a code sample to share. However, below is how you’d get an API key via the Momento Console as a user.

‍‍

You’d select the permissions you want, set the optional expiration date, and generate. You can then immediately use the API key in your workflows.

Tokens

Contrast that with a user-based disposable token that is issued on successful login. We can take a role-based example for a user who gets read-only access to the calendar-events cache, but publish and subscribe access to a topic for collaboration.

// called on successful login
exports.handler = async (event) => {
  const user = await loadUserMetadata(event.userId);
  let token;
  switch(user.role){
    case 'data-entry': 
       token = await getDataEntryToken(user.tenantId);
        break;
    case 'admin':
      token = await getAdminToken(user.tenantId);
      break;
    default:
      throw new Error('Role not supported');
  }

  return token;
};

const getDataEntryToken = async (tenantId) => {
  const scope = {
    permissions: [
      {
        role: 'readonly',
        cache: 'calendar-events',
        item: {
          keyPrefix: tenantId
        }
      },
      {
        role: 'publishsubscribe',
        cache: 'collaboration',
        topic: `${tenantId}-events`
      }
    ]
  };

  const response = await authClient.generateDisposableToken(scope, ExpiresIn.minutes(15));
  return {
    token: response.authToken,
    expiresAt: response.expiresAt.epoch()
  };
};

‍You can see here, we create a token valid for 15 minutes scoped to readonly permissions for capabilities and allowed to access only cache items that start with the tenantId the user belongs to. So we’ve restricted both the functionality and the data based on attributes of the user.

Key Takeaways

API keys and tokens have their pros and cons. One is not better than the other. As with all things in computer science, it depends on your use case. When deciding which auth mechanism you’re going to implement, consider how your users will be interacting with your application. ‍

Is it user based sessions on the web? Go with tokens. Maybe you’re expecting programmatic access only with no need to scope what data is available. Go with an API key. Feel free to save our reference table up top for quick reference. ‍

Regardless of the path you take, please remember to keep your data secure. Nobody wants a data breach to take them out of business. Be safe.‍

If you’re interested in how you can get started with Momento and need help determining your level of access control, you’re always welcome to hop onto our Discord and ask the team directly. If you’re more of a reader, the developer docs are available 24/7.
‍
Happy coding!

Building an interactive live reaction app with Next.js and Momento🎯

Allen Helton — Tue, 29 Aug 2023 19:07:11 +0000

Have you ever given an in-person presentation? As you talk, you look around the room and see heads nodding or people taking notes. Maybe someone raises their hand and asks a question about something you said. It’s easier to get engagement from someone when you can see them directly.

But online presentations are hard. You (generally) can’t see the people you’re presenting to. It feels the exact same as if you were rehearsing by yourself or presenting to a thousand people in your audience. There’s a disconnect between you and your audience that prevents you from knowing if you’re crushing it or flopping.

A few months ago, I saw a post from Michael Liendo describing how he uses Apple Keynote and WebSockets to maximize engagement in his presentations. He built a website that allows members of his audience to send emojis across the screen in real-time as he presents. How cool is that?!

I wanted that. I wanted to add the ability to send comments, too. So I read through his blog post and was immediately inspired, but there was a problem. I don’t have a Mac (I know, I know) so I can’t use Keynote. Plus his design involved using the AWS console to make a WebSocket API in AppSync. I felt like while this was incredibly cool for a single presentation it didn’t seem like it would scale well.

I already put WAY more effort into building presentations than I probably should, I can’t afford rebuilding a web app to help drive audience engagement every. single. time. I wish I had time for that, but the reality is…I don’t. With this in mind, I had a few requirements to run with for enhancing Michael’s phenomenal idea:

Compatible with Google Slides (who doesn’t like free?!)
Dynamically support presentations as I build them
Minimize the architecture and deployment requirements
Show some fun stats at the end of the presentation

Let’s take a look at how I built my live reaction app to satisfy these requirements.

Google Slides support

If you’ve ever built a presentation with PowerPoint or Keynote, you’ll know that Google Slides is… not that. It has a limited set of features and animations, but for the features it does have, it does them really well. Plus it’s free and has great online collaboration capabilities. If you already have Keynote or PowerPoint presentations, you can import them to Google Slides no problem.

When setting out on the build for this app I had two things in mind - avoid messing with presentation html and allow presentations from multiple authors. So I began poking around the Slides user interface until I found how to make presentations publicly accessible using the Publish to web feature.

When you publish a presentation to the web, you’re provided with an option to embed it. This will give you an iframe with a link to your slides. You can drop that into any web page and view it hosted in your app just like that! Upon closer inspection of the embedded code, I noticed something particularly useful:

<iframe 
 src="https://docs.google.com/presentation/d/e/2PACX-1vRLPL95FvyFg9DxT0iMfmOFLVwxTgEDFfl8Z/embed?start=false&loop=false&delayms=60000" 
  frameborder="0"
  width="1440" 
  height="839" 
  allowfullscreen="true" 
  mozallowfullscreen="true" 
  webkitallowfullscreen="true"/>

It looked like this could be parameterized! The presentation id is after https://docs.google.com/presentation/d/e/ part of the url. So all I had to do was drop the iframe in my Next.js app page and parameterize the src element to generically render presentations! It would look something like this: https://docs.google.com/presentation/d/e/${slidesId}/embed?start=false&loop=false&delayms=60000.

So to recap here, for Google Slides I had to publish it to the web then grab the id from the embedded iframe that was returned to me. Now I had to figure out what to do with that.

Dynamic presentation support

As I said earlier, I don’t want to rebuild this app every time I make a new presentation. I just want to take my presentation id, give it to the app, and be done. If I can do that, I’d consider it a win.

I structured my web app to look up presentations dynamically. My page structure looks like this:

├─pages
│ ├─[name]
│ │ ├─index.js
│ │ ├─react.js
│ │ ├─results.js

There are a few pages here, each with their own flavor of “dynamic-icity”.

Presentation page

That presentation id is not user friendly at all. I wanted to alias it with a friendly name so people don’t have to type that monstrosity. So I created an API endpoint in my app to do the mapping for me. For now I did a hardcoded list, but as I give more and more presentations I will move this over to a presentation management page where I store and update the details in a database.

import slides from '../../../lib/slides';
export default async function handler(req, res) {
  try {
    const { name } = req.query;

    const presentation = slides.find(m => m.name == name.toLowerCase());
    if (!presentation) {
      return res.status(404).json({ message: `A presentation with the name "${name}" could not be found.` });
    }

    return res.status(200).json({ id: presentation.id, title: presentation.title });
  } catch (err) {
    console.error(err);
    res.status(500).json({ message: 'Something went wrong' });
  }
};

The slides import is just a json array that has the id from Google Slides, title, and friendly name of my presentations:

const slides = [
  {
    name: 'caching-use-cases',
    id: '2PACX-1vQxQnmKrdy1FX3KzTWs7mC89UHDNH5kVeiUJpeZBnQiWNYXX6QjupaUln',
    title: 'You DO Have A Use Case For Caching'
  },
  {
    name: 'building-a-serverless-cache',
    id: '2PACX-1vSmwWzT1uMNfXpfwujfHFyOCrFjKbL8X43sd5xOpAmlK01lEICEm2kg',
    title: 'Behind the Scenes: Building a Serverless Caching Service'
  }
];

So when someone hits the /caching-use-cases endpoint in my app, the page will fetch the Google Slides id and title from the server side component and will use that to render the content in the iframe.

Reaction page

I wanted to be like Michael. The whole point of this was to drive audience engagement, so I had to provide a user interface for people to react to my presentation as I’m giving it. That’s where the /[name]/react path comes into play.

First, I had to get people to that page. But I didn’t want to hardcode anything, that was requirement #1. Luckily, I stumbled upon the react-qr-code library that will dynamically create and render QR codes in React apps. So I added a card underneath the presentation display that will always be visible so users can scan it with their phones and jump directly to the reactions.

<Card variation="elevated" borderRadius="medium" width="99%" padding="relative.small">
  <Flex direction="row" justifyContent="space-between">
    <Flex direction="row" alignItems="center">
      <Link href={`${router.asPath}/react`}>
        <QRCode
          value={`https://${process.env.DOMAIN_NAME}${router.asPath}/react`}
          size={256} 
          style={{ height: "auto", maxWidth: "4em" }} />
      </Link>
      <Heading level={4}>Scan the QR code to react live!</Heading>
    </Flex>
  </Flex>
</Card>

In case you’re wondering, I’m using the Amplify UI components for this project. I’m not much of a UI developer, so having these styled components has been a life saver! Anyway, adding the card beneath the presentation will result in something like this:

This will be visible during the entire presentation, so it doesn’t matter if audience members come in early or late, they’ll always be able to get to the reaction page to send some emojis or ask questions. The reaction page is optimized for mobile viewing, giving audience members the choice of three emojis or to add their own question/comment.

As the audience presses an emoji or types in a question, a message will be sent over our managed WebSocket (more on this in a sec) to the presentation page and it will be displayed on screen. Don’t worry, I’ve built in comment throttling and profanity filters for the inevitable hecklers.

Small deployment

Another one of my objectives with this project was to make a small, self-contained web app that doesn’t rely on a large backing architecture. This is meant to be simple. I didn’t want to mess with WebSockets or bounce around in the AWS console creating a bunch of cloud resources. Instead, I opted to take advantage of Momento to do all that hard work for me. This leaves my architecture small and simple 👇

Everything is self-contained in the Next.js app. The mappings of friendly names to presentation ids is done on the server-side component of the app and the WebSockets are handled via Momento Topics. I don’t have to manage cloud resources like WebSocket channels/topics or subscriptions. I plug in the Momento Web SDK and it just works. Literally.

Really the only thing you have to do to get this in the cloud is set up your web hosting. Since there aren’t dependencies on any specific cloud vendors, you could host this in Vercel, Fastly, or something like AWS Amplify (my personal preference). But before you go and set it up, there are two things you need to do first:

Update the /lib/slides.js file with your presentations
Configure three environment variables

MOMENTO_AUTH - API key issued via the Momento Console. This token will be used to configure short-lived API tokens the server-side component sends down to browser sessions
NEXT_PUBLIC_CACHE_NAME - Name of the cache to use for Momento. This must exist in the same region as your API key. The app does all the work for you, you just need to create a cache with any name you want.
NEXT_PUBLIC_DOMAIN_NAME - Base url of the custom domain for your app. It doesn’t even need to be a custom domain, you could update this to the generated domain after you deploy.

Then get it deployed! Once deployed, it will just start working.

Fun stats

I mentioned one of my requirements was to show some fun stats at the end of the presentation. What’s more fun than seeing who reacted the most and what the most used reactions are?!

Every time someone pushes one of the reaction buttons, a score is incremented for both the person reacting and the reaction used.

await cacheClientRef.current.sortedSetIncrementScore(process.env.NEXT_PUBLIC_CACHE_NAME, `${name}-reacters`, data.username);
await cacheClientRef.current.sortedSetIncrementScore(process.env.NEXT_PUBLIC_CACHE_NAME, name, data.reaction);

By using a sorted set, I’m building a leaderboard without having to do any of the hard work. I’m incrementing the score for a specific username and for a specific reaction in a cache. When the presentation is over and it’s time to look at the results, I can fetch the scores in descending order, which gives me the leaderboard effect automatically.

const getLeaderboard = async (cacheClient, leaderboardName) => {
  let board;
  const leaderboardResponse = await cacheClient.sortedSetFetchByRank(process.env.NEXT_PUBLIC_CACHE_NAME, leaderboardName, { startRank: 0, order: 'DESC' });
  if (leaderboardResponse instanceof CacheSortedSetFetch.Hit) {
    const results = leaderboardResponse.valueArrayStringElements();
    if (results.length) {
      board = results.map((result, index) => {
        return {
          rank: index + 1,
          username: result.value,
          score: result.score
        };
      });
    }
  }
  return board;
};

You can see once again, the leaderboard results are dynamic, using the friendly name of the presentation as the key that stores the data. This ends up with a page looking like this:

This brings a little competition and fun to the presentation, hopefully keeping the audience fully engaged.

Try it yourself

I would highly encourage you to try this out yourself. Get your audience involved and make your presentations stand out.

If you’re wondering how much this would cost you, you’ll probably like the answer - nothing!

Hosting platforms like Vercel cost nothing to host hobby projects. Momento is priced at $0.50 per GB of data transfer with 5GB free every month. Each reaction sends an 80 byte message, so we can calculate your free amount of reactions as:

5 GB / (80 bytes x 2 (data in from publisher and out to subscriber)) = 33.5 million messages

So if you keep your reaction count to fewer than 33.5 million a month, then you’re well within the free tier 🙂. But if you do surpass it, you can get ~13 million reactions for $1.

At the end of the day, the goal is to help people understand and remember your message. Feel free to change the reactions, add more, take away the commenting, anything that helps keep the attention on your content.

Thank you to Michael Liendo, who gave me inspiration to build this. It’s a lot of fun and has lots of potential for enhancements in the future. I’m excited to deliver engaging presentations and get real-time audience feedback.

If you’re ready to try, I’m always available to help, be it with your deployment, getting a Momento token, or figuring out how to publish your presentations. You can contact me or someone from the Momento team on Discord or via the Momento website.

Happy coding!

What is a vector index?

Kirk Kirkconnell — Wed, 23 Aug 2023 17:30:37 +0000

A vector index is a specialized type of index designed to store and manage multidimensional data called vectors. We can produce vectors from AI models called "embedding models". Embedding models summarize an object (an article, an image, a video) as a vector. This numerical representation preserves the meaning (semantic content) of the original object. Each one of those numbers in a vector is called a vector embedding.

Vector indexes are the key to managing ultra-efficient AI-enabled systems. They store vector data that captures an item's essence mathematically, called vector embeddings. To help you visualize this better; when in a vector index, these vector embeddings effectively look like a 3D holographic “star chart” you’d see in a sci-fi movie, with each embedding having its own point of light on that chart. The more embeddings are closer to each other, the larger and brighter that point is. Now imagine that in 1000 dimensions.

To put it in technical terms, related vector embeddings reveal relationships in the data. With that, vector indexes enable your apps to move beyond the simple matching of plain text search into the realm of AI-enhanced semantic search.

Why is semantic search so important?

Let’s look at an example. Here’s one image from a fashion photo shoot in Casablanca, along with a photo of a woman in a bazaar in Casablanca. Normally these photos are not related by much. Each has very different clothing, different lighting, different people, one photo is color and one is sepia-toned black and white, one is in a building while the other is on a street, and so on.

What does connect them, though, is they are both photographs, they are both images of humans, they are both wearing clothing, and both images were made in the city of Casablanca, Morocco. How might that look in vectors stored in a vector index?

Photo1 = [ 234.53, 45.31, 23.45, …]
Photo2 = [ 45.32, 98.6, 23.45, …]

What this means is by writing these vectors to a vector index, the similar vector embeddings are near, next to, or the same as other vectors in the multidimensional space of a vector index. They are automatically related. Therefore, I can do a semantic search to find all vectors that have vector embeddings at or near 23.45, have other vector embeddings that are at or near 45.31, and so on. Search for XYZ near 45.31. Since we are working with an array of numbers and not bulky data, satisfying such a search is dramatically more efficient and more performant than other index types.

What would use a vector index for?

Now that you know what a vector index does for you with vector embeddings, let’s talk about some examples of what you can do with a vector index.

AI chatbot

Think of having an AI trained on a specific set of data, a product’s documentation website, for example. Combine that with other training data on writing applications in various languages, and now you can ask questions of that set of documentation and have the AI write code for your users. No more searching through the documentation, just ask normal questions of the AI, and it generates a response that is on topic or similar to what you asked about. This use case is what I like to call a knowledge bot.

Search

This approach is fundamental in various applications, including recommendation engines, image recognition, and Natural Language Processing(NLP). The ability to efficiently find related items in vast datasets makes similarity search a powerful tool in modern data analysis and machine learning. This can be used in product catalogs, video/music streaming sites, e-commerce sites, etm.

Recommendation engine

While I mentioned this under search, it is common enough to stand alone. Let’s say you are researching online to purchase a table saw. Knowing that, your past purchases, and what other people searched for, clicked on, and rated well, an AI-enhanced recommendation could deliver to you sites, images, and videos more pertinent to your research than a simple search.

Anomaly Detection

Anomaly detection is a critical process in data analysis that identifies unusual patterns that do not conform to expected behavior. These outliers or anomalies can signify problems such as fraud, network intrusion, or system failure. The ability to detect anomalies in real-time allows for immediate response, mitigating potential risks and enhancing the overall integrity and reliability of a system.

Sentiment analysis

By mapping words into a multidimensional space of a vector index, patterns correlating to positive, negative, or neutral emotions can be discerned. This allows for efficient comparison across large datasets, helping gauge opinions on products, trends, or other things. With real-time processing and adaptability, vector indexes become valuable for understanding and responding to user sentiments across various platforms.

These are just a few examples, but what use cases can you think to use this for?

How we built Momento Topics, a serverless WebSocket service

Allen Helton — Tue, 22 Aug 2023 19:25:24 +0000

A few months ago, Momento released Topics, a fully-managed serverless WebSocket service. This service aims to connect everything. You can connect backend service to backend service, backend service to user interface, even user interface to user interface with literally two API calls 👇

// sends a message containing “hello world!”
await topicClient.publish("tutorials", "greetings", "hello world!");

And to receive the message:

await topicClient.subscribe("tutorials", "greetings", {
  onItem: (message) => console.log(message.valueString())
});

No resource management, no configuration, just code and go!‍

Services like Amazon SQS, SNS, or Kinesis are also serverless event messaging services, but they focus on sending and receiving events between backend components. ‍

On the flip side, services like Pusher, Amazon AppSync, and AWS IoT Core are intended to be used like WebSockets, connecting backend services to user interfaces. These are fantastic services that abstract away many of the complexities of WebSockets, but we at Momento thought we could continue to improve the developer experience for building real-time communications.‍

So we built Topics, the dead-simple service that enables everything to communicate with everything… if you want it to.
‍

This highly scalable service is ready for prime time, boasting incredibly low latencies at millions of transactions per second (TPS).

Don't believe us? Let's take a look at how it was built and what makes this service shine.

Events and Subscriptions

Let’s first talk about what's going on when you publish and subscribe for information to Momento Topics via a topic. Think about a topic as a focused form of communication between publishers and subscribers, like a dedicated chat room.‍

When your user interface or backend service subscribes to a topic, it's telling Momento it wants to be notified when something happens. Specifically, it wants to know when someone publishes, or sends, an event to a topic. Use cases for subscribing could be alerting end users when a teammate signs on or when you receive an instant message. ‍

With Momento Topics, a subscription is a long-lived gRPC connection between the subscriber and the Momento servers. You could think of this as setting up a phone call. Your user interface calls the Topics service and now has an open connection with it, allowing data to instantly transfer directly between the two.‍

This differs from something like Amazon SNS because SNS does not maintain active connections. Instead, subscribers to SNS would be added to a phone book and the service would know who to call when an event occurs. This results in higher latency but does offer stateless communication. Stateless communication works great when real-time isn’t a requirement, like when you need to send an email or add something to a queue. It’s totally acceptable to have the higher latency in these situations due to the async nature of the workflow.‍

Topics differs from other WebSocket services due to the transfer protocol. A standard WebSocket connection will transfer data over ws or wss. IoT devices typically transfer data via MQTT. But since Momento Topics uses gRPC, that means your data is being transferred over HTTP. Let’s look at some of the key differences between these protocols:

WSS

Pros: Offers full-duplex (simultaneous two-way) communication. Provides low bandwidth overhead by not requiring headers and metadata in requests. Widely supported in browsers and server environments
Cons: Can be complex to set up and manage connections. Consumes a high amount of server resources due to connection management.

MQTT

Pros: Optimized for low-bandwidth, high-latency, or unreliable networks. Offers various levels of message delivery guarantees. Provides last will and testament (LWT).
Cons: Not natively supported in web browsers. Has higher latencies compared to wss because of additional overhead (keep-alive mechanism)

HTTP (gRPC)

Pros: Uses HTTP/2, bringing multiplexing and header compression for faster performance. Enables bidirectional streaming.
Cons*: Uses protocol buffers for serialization instead of JSON or XML. Might not be as familiar for integrators as REST.‍

Events, on the other hand, need little explanation. The best way I've heard event-driven architectures described was from Eric Johnson at Momento's conference, MoCon. He said "something happened... And we react." An event is that “something”.‍

An event can be represented by anything. It can be a boolean, an entity identifier, an entire JSON object, heck, it could even be a photo. With Momento Topics, you provide a byte array as the payload of your event, meaning the possibilities are literally endless. All subscribers will receive the same payload you provide, allowing them to react to the event appropriately. ‍

Now that we understand what subscriptions are and what they are receiving, let's talk about the architecture.

Service architecture

For me, the biggest reason to use a serverless WebSocket service is the infrastructure management provided by the vendor. I'm not a sysadmin, I'm an app builder. I don't know (nor do I want to know) how to load balance connections, set up auto scaling groups, or how to calculate utilization rates and manage spiky workloads. ‍

What I do know is how to provide business value to an application. That's what I want to spend my time and energy on, not worrying about infrastructure. Luckily for us all, the engineers at Momento know and care deeply about infrastructure. Here's a peek at how they built Topics to provide serverless capabilities like on-demand scaling, pay for what you use, no scheduled downtime, and no resource provisioning.

Topics is built on a 2-tier architecture, allowing it to fan-out 1000x faster and better than managing connections yourself. The two tiers are the routing layer and storage layer.‍

Routing Layer

Tier 1 is the routing layer. This layer has a full topographical map of the Momento ecosystem in memory. It's responsible for fielding requests from the SDKs, managing connections to subscribers, and calculating which tier 2 node holds the topic messages (more on this in a minute). ‍

This layer is an autoscaled fleet of high throughput EC2 instances. The Topics service uses a round robin load balancing pattern to manage incoming subscriptions. When capacity nears a specific threshold, the Control Plane (the brains behind Momento), will warm up another instance, pass the ecosystem topography to it, then add it into the mix. ‍

Storage Layer

Tier 2 is the storage layer. When events are published to a topic, they will land here and be distributed out to connected routing layer nodes. ‍

The neat part of the architecture is in this layer. The storage node only communicates with routing layer nodes, and the routing layer nodes communicate with subscribers. When a new subscription is added, the designated routing node will calculate which storage node owns the topic and establish a long-lived gRPC connection with it. ‍

This is the exact same pattern the router is doing with end user subscriptions. So the Momento servers are subscribing with other Momento servers and reusing connections whenever possible. Let's look at an example.

In our example above, we have 3 routing nodes and 2 storage nodes. We have 5 end users all subscribing to the "Dallas - TX - Hail storm" topic in the "weather-alerts" cache. Through a deterministic hash, it's determined that topic messages will live on Storage-A.‍

As the connections come in, the requests are being evenly distributed across Router-A, Router-B, and Router-C. On our fourth subscriber, Router-A determines it needs to connect to Storage-A but knows it already has a connection open with it, so the connection is reused!‍

The same thing occurs for subscriber #5 - it reuses the existing connection it has with the storage node. When a message is published to a topic, the storage node will broadcast it to its open connections, then the routing nodes will do the same and broadcast to their open connections to subscribers. ‍

This allows for fewer connections and a highly scalable fan-out pattern, resulting in blazingly fast message distribution and an elastically scalable experience for developers. ‍

Determining topic location

In the initial conversations before building the Topics service, it was agreed upon that developers would never have to create topic resources before using them. The team felt it took away from the developer experience and was an unnecessary part of event messaging. The goal of the service was to take on as much of the undifferentiated heavy lifting as possible, leaving application developers with one focus - solving business problems.‍

So instead of creating topic resources and assigning them a storage location on a server, the team opted to use a rendezvous hash using the subscriber's account id, cache name, and topic name to dynamically and definitively point to a storage node. ‍

This means as routing nodes are shuffled in and out of the load balancer, they don't need to maintain state of which topics exist and where they live, they can calculate the location in microseconds and route appropriately, leaving the service more dynamic and resilient as a result. ‍

Summary

Topics runs on the same hardware as our caching service. It's been intentionally designed to be low latency, highly scalable, and ready at a moment's notice.‍

Using the 2 tier architecture allows us to fan-out subscriptions and provide you with low latency message delivery, no matter how many subscribers you have. This type of architecture is what differentiates Momento Topics from other events messaging services. Many other services have the capability to run this way, but most of them don't manage it for you. ‍

Momento does. This means you can sleep at night knowing you won't receive a call at 3am stating your messages aren't being delivered. That's on us. ‍

Remember that subscriptions are gRPC connections, they are like phone calls. A phone call is stateful, meaning both you and the person you're calling have to be on the line for it to work. This means trying to subscribe in a serverless function like Lambda or a Cloudflare worker won't work. When function execution is over, your code is put on pause. This means your phone call will be put on hold and eventually disconnect. Use it in stateful environments like a browser, container, or server. ‍

That said, you can publish from anywhere. Publishing data does not set up a connection, so feel free to use it in all your functions.‍

We want you to be successful with Topics. Let us know if you have any questions, comments, or concerns. We believe in open architecture, so if you want to dig in a little more in something, reach out! We are happy to share how it works.

You can reach the team directly through the website or contact me directly. We look forward to hearing from you and seeing what you build!
‍
Happy coding!

Why are WebSockets so hard?

Allen Helton — Tue, 15 Aug 2023 16:13:26 +0000

A couple of years ago I worked on a project to bring real-time notifications into my web application. I was excited at the idea of “real-time” and immediately knew I was going to get a chance to implement WebSockets.

I knew what WebSockets did, but I didn’t really know what they were - meaning I knew you could send messages from a server to a browser, but I had no idea how. I didn’t know much more than the fact there were “connections” that you could use to push data both to and from the back-end.

I set off to build what I thought was going to be a two-day task. What could possibly go wrong?

I then plunged into a downward spiral of complexity that made me rethink being a software engineer. Let’s talk about it.

WebSocket API structure

I come from a REST background. Endpoints have resource-based paths with intent shown by which http method you’re using (GET = load data, POST = create data, PUT = update data, etc…).

The first thing I saw in the AWS API Gateway documentation were these weird $connect and $disconnect routes. By naming convention I assumed what these routes did, but I didn’t know what to do with them.

It wasn’t intuitive to me how to uniquely identify a user who was trying to connect. I didn’t know if data would freely pass back and forth across this connection once it was established. I also had no idea how to keep track of the connection or if I even needed to keep track. It was just one rabbit hole after another.

Eventually I discovered that with AWS API Gateway, the connections are managed by the service itself, but you (the developer) are responsible for keeping track of who is connected and what information they receive. I also learned that data does not just freely flow back and forth.

For interactions going from the client to the server, you have to define your own routes and point them to backing compute resources. Each route required an API Gateway V2 Route, API Gateway V2 Integration, Lambda function, and Lambda function permission resource defined in my Infrastructure as Code, which was about 50 lines per route.

For data going from the server to the client, you can send anything you want. You need to develop a convention for identifying different types of messages so you can handle them appropriately.

The disparity between client-to-server and server-to-client threw me for a loop. One was very rigid and structured, while the other was loosey goosey. It doesn’t quite feel like a way to build scalable, maintainable software.

Connection management

As I said earlier, API Gateway manages maintaining connections for you, but you’re responsible for figuring out what data to send to which connection. Let’s take an example:

Imagine our user, Mallory, wants to be notified when tickets for Taylor Swift, Adele, or Ed Sheeran become available. When she connects to our ticket vendor site we save 4 records into our database:

One record that identifies the connection and user metadata
One record for each artist she wants to be notified for

For the artist records, the pk is her connection id and the sk indicates it’s a subscription record. We add the artist name as a GSI so when we get an event indicating that Ed Sheeran tickets are on sale, we can immediately notify all the connections subscribed to him.

To notify the subscribers with an AWS serverless back-end, we’d trigger a Lambda function on an EventBridge event saying which artist had tickets available. The function would query the artist GSI in DynamoDB to find all the connections subscribed to the incoming artist. Then, we’d iterate over each record publishing the ticket information to the connected users. That’s a lot of work!

When the user disconnects, we can query the database for all records with the pk containing the connection id and delete them. In case we miss the disconnect event from API Gateway, we set a time to live (TTL) on the connection records for 24 hours (or whatever fits your use case) to delete them automatically.

This is a lot of infrastructure for something with “technically” no business value. This is simply a microservice that alerts users. This is code that you have to maintain over time that could get stale, slow, or deprecated. Code is a liability, after all.

Security

I come from a GovTech background. An app isn’t secure until it’s overly secure. So when I found out that the only route on a WebSocket API that supports auth is $connect, I was a little taken aback. Once a connection is established, it has free reign to call any route it wants without passing in an auth header or any other form of credentials.

I’ve had a while to stew on this, and it makes sense in theory. Since WebSocket connections are stateful, you shouldn’t need to reauthenticate every time you make a call. That would be like knocking on someone’s door, saying your name, then after you’re inside, restating who you are every time you do something. Doesn’t really make sense.

Passing in an auth header to a WebSocket isn’t as easy as you’d think either. Popular clients like SocketIO don’t really support auth headers well unless you use it for both the client and server. Best way I found to pass a bearer token through to a WebSocket hosted in AWS was to use a query string parameter. You could also repurpose the Sec-WebSocket-Protocol header to accept both a subprotocol and the auth token, but that is against the grain and one of those “just because you could doesn’t mean you should” moments.

Client-side SDKs

People seem to love SocketIO. It has over 4 million weekly downloads on npm and is arguably one of the better ways to connect to a WebSocket. But just because it’s popular doesn’t mean it’s easy.

For whatever reason I struggled big time to get it working with API Gateway. Something with the WebSocket protocol (wss instead of https) and the way AWS set up the API just didn’t get along well.

Through much trial and error, shifting auth around, and a few rage quits, I’ve been able to get WebSockets hooked up to my user interfaces once or twice. But every time I do it, I have to relearn the tricks of getting it just right. Sometimes when things do everything, like SocketIO, they lose a bit of their intuitiveness and developer experience.

An easier way

With Momento Topics, all the hard parts of WebSockets are abstracted away. There is no API structure to build. Subscribers can connect and register for updates to specific channels with a single API call:

await topicClient.subscribe('websocket', 'mychannel', {
  onItem: (data) => { handleItem(data.valueString()); },
  onError: (err) => { console.error(err); }
});

To publish to a channel, the call is even simpler:

await topicClient.publish('websocket', 'mychannel', JSON.stringify({ detail }));

You can connect service to service, service to browser, even browser to browser with Topics. Since the service uses Momento’s servers for connection management, you have options available that haven’t been possible before, like having two browser sessions communicate without getting a server involved. This leaves you with two responsibilities: publishing data when it’s ready and subscribing for updates.

As with other serverless services, Momento Topics comes with security at top of mind, but also leaves you with flexible options to restrict access. With fine-grained access control, you can configure your API tokens to be scoped as narrowly as possible. An example access policy might be:

const tokenScope = {
  permissions: [
    {
      role: 'subscribeonly',
      cache: 'websocket',
      topic: 'mychannel'
    }
  ]
};

An API token created with this set of permissions would only be allowed to subscribe to the mychannel topic in the websocket cache. If someone intercepted the token and attempted to publish data or subscribe to a different topic, they would receive an authorization error.

Momento has a plethora of SDKs for you to integrate with. For browsers, you can use the Web SDK. For server-side development, the Topics service is available for TypeScript/JavaScript, Python, and Go, with support for .NET, Java, Elixir, PHP, Ruby, and Rust coming soon.

What’s the catch?

Hopefully that sounds too good to be true. It did to me at first. Heck, it still does. But there is no catch. Momento’s mission is to provide best-in-class developer experience for their serverless services and take as much of the burden off of developers as possible.

You don’t need to spend weeks building notification services that handle complex connection management and event routing. Let SaaS providers like Momento take the operational overhead from you so you can focus on what really matters.

Pricing is simple, $.50/GB of data transfer in and out, with a 5GB perpetual free tier. There’s no reason not to try it!

Looking for examples? Check out this fully functional chat application built with Topics in Next.js. You can also try our work-in-progress game Acorn Hunt, built on both Momento Cache and Topics.

If you have any questions, feedback, or feature requests, please feel free to let the team know via Discord or through the website. These services are for all of us and we want to build the best possible product to get you to production safely and quickly.

Happy coding!