Forem: Phil Nash

5 quick tips for giving better presentations

Phil Nash — Thu, 05 Mar 2026 00:00:00 +0000

I have been speaking publicly at developer conferences for over a decade and in that time I've seen plenty of other people giving talks. Everyone gives talks differently, and if you are a speaker or thinking about getting into it, I encourage you to work on developing your unique style as well as your content.

However, recently I have noticed a few little things that some speakers do and others do not that make for a better experience for both the audience and the presenter. They are mostly inconsequential for the content of a talk, but they can act as speed bumps that take the momentum out of a talk.

So, in no particular order, here are a few things you should remember when giving a talk.

Finish strong

Have you ever seen this situation play out?

A speaker wraps up their talk nicely with an insightful conclusion. Then they say something like, "Thank you very much. Any questions?"

The audience, prepared to bring the house down with cheering and clapping are now caught like a deer in headlights. Instead of rapturous applause an awkward silence descends. Someone puts their hand up and asks a question. Q&A ensues, but everything now feels a bit flat.

As an audience member, you want to show appreciation for a great job, so as a speaker you should give space for that. When concluding a talk, end with a "Thank you" or some other definitive way to indicate you are finished speaking, and let the crowd have their moment to thank you. Once the applause dies down, that's when you, or even better an MC or moderator, can indicate that it is time (or not) for questions.

Don't let a talk fizzle out into the Q&A portion, finish on a high. Deal with the questions afterwards.

Take your lanyard off

The conference lanyard is the sign that you are part of this temporary tribe that has formed. When wearing it, you can speak to anyone else also adorned in the conference colours knowing you have some common ground.

When you are on stage to deliver a talk, you no longer need the lanyard to signify you are part of the experience. But that's not why you should take it off. You should remove it for the duration of your talk simply because it gets in the way.

On stage there is more to consider. A lanyard hanging around your neck may get in the way of your laptop if you need to use it, it might bother the microphone cable, it might just not look that great. You probably didn't choose your outfit based on the conference colour scheme, so why throw a dangling colour clash into the mix?

Take your lanyard off for the talk, but don't forget to put it back on afterwards. You will want to help people remember who you are once you're no longer in the spotlight.

Keyboard shortcuts you wish you'd known for years

There are two macOS keyboard shortcuts that I think are vital to know if you want to make smooth transitions between parts of your talk.

Cmd + F1

Let's say you're presenting in Keynote and you're using the presenter view; you can see the notes on your laptop, the audience sees the slides on the screen. Your laptop is using the external screen as an extended display. Now you want to demo something. To do this you need to switch to mirroring, so that you can see the same demo on your laptop as the audience is seeing on the screen.

You could make this switch by opening the display settings and changing the setting. However, the keyboard shortcut is Cmd + F1. Flipping between mirroring and extending with a keypress is quicker, smoother and keeps the energy in your presentation. Cmd + F1, commit it to memory.

Cmd + Shift + F

If you're using Chrome and you want to go into fullscreen (with the green button in the top left of the window, or Shift + Fn + F) and you find that the address bar is still showing, taking up valuable screen real estate, that can be fixed. Cmd + Shift + F toggles whether the address bar shows. You're probably reading this in Chrome right now, try it out!

Resize the font, don't zoom the window

If you plan to live code or just show code within an IDE as part of a demo, you should make sure that the font is readable by the audience. I recommend doing this as part of a tech check before you are supposed to go on stage. This gives you the time to check things yourself and resize the text correctly.

You might think that zooming in, with the shortcut Cmd and + will help the audience see the code. In VS Code and other similar editors, that zooms the text and the entire interface. By the time the text is of a size that the audience will be able to read it, it will be crowded out by the file explorer and the terminal.

Instead, take the time to open the settings (Cmd + ,) and change the font size to something readable. That way the rest of the interface stays out of the way. If you plan to show things in the built-in terminal, make sure to increase the terminal font size too, it's a different setting.

Walk to the back of the room, or get a friend to check and give you a thumbs up, and make sure things are readable before you start the talk.

Trust in the tech

If you are in the fortunate position to be speaking at an event with a crew looking after you on stage, they will likely give you a microphone before you go on and be in control of it. Most of the time this means that the microphone is on and you needn't touch it, but it is muted at the sound desk. Then, when it is time for you to start they will unmute you and everyone will hear you.

It can be unnerving to believe that once you start speaking everyone will be able to hear you, but you should. Much like my first tip to finish strong, you also want to start strong, and "Hello, can you hear me? Is this on?" is not the way to achieve that.

Instead start by introducing yourself, start with a joke, start by thanking the audience for showing up. However you want to start, assume you will be heard and confidently start speaking. If something does go wrong, it's not your fault (unless you turned your microphone off on purpose). When things go right, you will capture the audience's attention and kick your session off in style.

Small tips, big impact

I have, of course, done the opposite of all of these things myself. I've started by asking if I can be heard, zoomed my editor until only the side panel was visible, fiddled in settings and menus to make my screen show the right thing, flapped my lanyard around on stage, and finished with "Thank you, any questions?" and a silent room. I only hope by sharing some of these tips that you can avoid all of those things yourself, sail through your talk with all your energy being used to wow the audience, and feel great at the end of it all.

So remember your shortcuts, take that lanyard off, trust that you will be heard, get your font sizes right, start with confidence and finish strong. I can't wait to see what you are going to talk about.

Things you need to do for npm trusted publishing to work

Phil Nash — Sat, 31 Jan 2026 00:00:00 +0000

After the recent supply chain attacks on the npm ecosystem, notaby the Shai-Hulud 2.0 worm, GitHub took a number of actions to shore up the security of publishing packages to hopefully avoid further attacks. One of the outcomes was that long-lived npm tokens were revoked in favour of short-lived tokens or using trusted publishing.

I have GitHub Actions set up to publish new versions of npm packages that I maintain when a new tag is pushed to the repository. This workflow used long-lived npm tokens to authenticate, so when it came to updating a package recently I needed to update the publishing method too. The npm documentation on trusted publishing for npm packages was useful to a point, but there were some things I needed to do that the docs either didn't cover explicitly or weren't obvious enough, to get my package published successfully. I also came across this thread on GitHub where other people had similar issues. I wanted to share those things here.

TL;DR

Briefly, the changes that worked for me were to add the following to my GitHub Action publishing workflow:

# Permission to generate an OIDC token
permissions:
  id-token: write

jobs:
  publish:
    steps:
      ...
      # Ensure the latest npm is installed
      - run: npm install -g npm@latest
      ...
      # Add the --provenance flag to the publish command
      - run: npm publish --provenance

And ensure that the package.json refers to the correct repository:

{
  ...
  "repository": {
    "type": "git",
    "url": "git+https://github.com/${username}/${packageName}.git",
  },
  ...
}

For a bit more detail and alternative ways to set some of these settings, read on.

Package settings

Ok, so this is embarrassing, but initially I couldn't find the settings I needed to enable trusted publishing. The npm docs say:

Navigate to your package settings on npmjs.com and find the "Trusted Publisher" section.

I spent far too long looking around the https://www.npmjs.com/settings/${username}/packages page for the "Trusted Publisher" section. What I needed was the specific package settings, available here: https://www.npmjs.com/package/${packageName}/access.

You need to set up trusted publishing for each of your packages individually. That might be fine if you only maintain a few, it's going to be a huge hassle if you have a lot.

Once you have filled in the trusted publisher settings, then its on to updating your project so that it can be published successfully.

Permissions

This is in the npm docs, so I'm just including it for completeness. You need to give the workflow permission to generate an OIDC token that it can then use to publish the package. To do this requires one permission being set in your workflow file.

permissions:
  id-token: write

npm version

The docs clearly call out that:

Note: Trusted publishing requires npm CLI version 11.5.1 or later.

I needed to upgrade the version of npm used by my GitHub Actions workflow, so I added a simple step to install the latest version of npm as part of the run before publishing:

- run: npm install -g npm@latest

Automatic provenance

The docs also say:

When you publish using trusted publishing, npm automatically generates and publishes provenance attestations for your package. This happens by default—you don't need to add the --provenance flag to your publish command.

I did not find this to be the case. I needed to add the --provenance flag so that my package would publish successfully.

run: npm publish --provenance

This was something that seemed to help others too. You may only need to pass --provenance the first time, with it continuing to work automatically beyond that, but it can't hurt to keep it in your publish script (for when you need to update another package and you copy things over).

You can also set your package to generate provenance attestations on publishing by setting the provenance option in publishConfig in your package.json file.

{
  ...
  "publishConfig": {
    "provenance": true
  }
  ...
}

Or you can set the NPM_CONFIG_PROVENANCE environment variable.

env:
  NPM_CONFIG_PROVENANCE: true
run: npm publish

Repository details

Finally, I don't know if this last part helped as I did already have it set, but others in this GitHub thread found that setting the repository field in the package's package.json to specifically point to the GitHub repository also helped.

{
  ...
  "repository": {
    "type": "git",
    "url": "git+https://github.com/${username}/${packageName}.git",
  },
  ...
}

When you set up your trusted publisher in npm you do have to provide the repository details, so it makes sense to me that the package should agree with those details too.

Keep the ecosystem safe

Short-lived tokens, trusted publishing, and provenance all help keep the entire ecosystem safe. If you've read this far, it is because you are also updating your packages to publish with this method

I know there are people out there with many more packages, and packages that are much more popular than any of mine, but I hope this helps. It does amuse me that I went through this for a package that I'm pretty sure I'm the only user of, but at least I now know how to do it for the future.

I hope to see trusted publishing continue to expand to more providers, it is limited to GitHub and GitLab at the time of writing, and to be used by more packages. And I hope to see fewer worms charging through the package ecosystem and threatening all of our applications in the future.

How wrong can a JavaScript Date calculation go?

Phil Nash — Wed, 14 Jan 2026 22:46:00 +0000

The Date object in JavaScript is frequently one that causes trouble. So much so, it is set to be replaced by Temporal soon. This is the story of an issue that I faced that will be much easier to handle once Temporal is more widespread.

The issue

In January 2025 I was in Santa Clara, California writing some JavaScript to perform some reporting. I wanted to be able to get a number of events that happened within a month, so I would create a date object for the first day of the month, add one month to it and then subtract a day to return the last day. Seems straightforward, right?

I got a really weird result though. I reduced the issue to the following code.

const date = new Date("2024-01-01T00:00:00.000Z");
date.toISOString();
// => "2024-01-01T00:00:00.000Z" as expected
date.setMonth(1);
date.toISOString();
// => "2023-03-04-T00:00:00.000Z" WTF?

I added a month to the 1st of January 2024 and landed on the 4th March, 2023. What happened?

Times and zones

You might have thought it was odd for me to set this scene on the West coast of the US, but it turned out this mattered. This code would have run fine in UTC and everywhere East of it.

JavaScript dates are more than just dates, they are responsible for time as well. Even though I only wanted to deal with days and months in this example the time still mattered.

I did know this, so I set the time to UTC thinking that this would work for me wherever I was. That was my downfall. Let's break down what happened.

Midnight on the 1st January, 2024 in UTC is still 4pm on the 31st December, 2023 in Pacific Time (UTC-8). date.setMonth(1) sets the date to February (as months are 0-indexed unlike days). But we started on 31st December, 2023 so JavaScript has to handle the non-existant date of 31st February, 2023. It does this by overflowing to the next month, so we get 3rd March. Finally, to print it out, the date is translated back into UTC, giving the final result: midnight on 4th March, 2023.

All of these steps feel reasonable when you break it down, the confusion stems from how unexpected that result was.

So, how do you fix this?

Always use UTC

Since I didn't actually care for the time and I knew I wanted to work with UTC, I fixed this code using the Date object's setUTCMonth method. My original code subtracted a day to get the last day in a month, so I used the setUTCDate method too. All set${timePeriod} methods have a setUTC${timePeriod} equivalent to help you work with this.

const date = new Date("2024-01-01T00:00:00.000Z");
date.toISOString();
// => "2024-01-01T00:00:00.000Z"
date.setUTCMonth(1);
date.toISOString();
// => "2024-02-01-T00:00:00.000Z"

So this fixed my issue. Can it be better though?

Bring on Temporal

One of the reasons this went wrong was because I was trying to manipulate dates, but I was actually manipulating dates and times without thinking about it. I mentioned Temporal at the top of the post because it has objects specifically for this.

If I was to write this code using Temporal I would be able to use the Temporal.PlainDate to represent a calendar date, a date without a time or time zone.

This simplifies things already, but Temporal also makes it more obvious how to manipulate dates. Rather than setting months and dates or adding milliseconds to update a date, you add a duration. You can either construct a duration with the Temporal.Duration object or use an object that defines a duration.

Temporal also makes objects immutable, so every time you change a date it returns a new object.

In this case I wanted to add a month, so with Temporal it would look like this:

const startDate = Temporal.PlainDate.from("2024-01-01");
// => Temporal.PlainDate 2024-01-01
const nextMonth = startDate.add({ months: 1 });
// => Temporal.PlainDate 2024-02-01
const endDate = nextMonth.subtract({ days: 1 });
// => Temporal.PlainDate 2024-01-31

Date manipulation without worrying about times, wonderful!

Of course, there are many more benefits to the very well throught out Temporal API and I cannot wait for it to be a part of every JavaScript runtime.

Mind the time zone

Temporal is just rolling out to JavaScript engines. At the time of writing, it is available in Firefox and just landed in Chrome 144. It's also listed as behind a flag in Safari Technical Preview. If you want to test this out open up Firefox or Chrome, or check out one of the polyfills @js-temporal/polyfill or temporal-polyfill.

If you still have to use Date make sure you keep your time zone in mind. I'd try to move to, or at least learn how to use, Temporal now.

And watch out for time zones, even when you try to avoid them they can end up giving you a headache.

Improve Your Python Search Relevancy with Astra DB Hybrid Search

Phil Nash — Wed, 30 Apr 2025 00:44:04 +0000

Astra DB now supports hybrid search, which can increase the accuracy of your search by up to 45%. It does this by performing both vector search and BM25 keyword search and then reranking the results from both to return the most relevant results.

In this post, we'll take a look at how to use Astra DB Hybrid Search in Python.

What is hybrid search?

Before we get to the code, let's go over what hybrid search actually is and why it helps. You would typically build a retrieval-augmented generation (RAG) app by creating vector embeddings for your unstructured content and storing them in a database. Then, when a user makes a query, you turn the query into a vector embedding and use it to perform a similarity search to return relevant context that you can provide to a large language model (LLM) to generate an answer.

The more accurate and relevant your search results from your database are, the better your RAG application will be. With better context, there’s less opportunity for the LLM to return inaccurate or hallucinated responses.

To improve on the relevancy of this system, we need to focus on the search element. Vector search is great at understanding context and meaning, but it can miss results that would be returned from a keyword match. Meanwhile, keyword search can be restrictive as it doesn't understand context. Performing both searches gives us the best chance of returning the top results, but you then need to combine those results so you can pass them to an LLM. This is where reranking comes in.

Reranking is performed by another machine learning model—a cross-encoder—that more accurately scores relevance because the model uses both the original query and the document to create the score. You can't use reranking models for search because it would require scoring every document in your database against the query every time; for small subsets of your data, however, this is achievable.

You can actually use a reranker to help improve vector search results, by returning more results than required, reranking to adjust the order, then returning the top results.

In hybrid search, we use reranking to rescore the combination of results from the vector and keyword searches and pick the top, most relevant results from the output.

Astra DB can now perform hybrid search by combining vector search and BM25 keyword search, then reranking using the NVIDIA NeMo Retriever reranking microservices (including the nvidia/llama-3.2-nv-rerankqa-1b-v2 reranking model). Let's take a look at how to use Astra DB hybrid search to improve search relevancy in your Python application.

Hybrid Search in Python with Astra DB

Let's start by creating a database in your DataStax account. While it’s provisioning, let's get our coding environment set up.

To use Hybrid Search in Python, you’ll need to install version 2 of astrapy as well as python-dotenv so that you can load environment variables from an .env file. Install the dependencies:

pip install "astrapy>=2.0,<3.0" python-dotenv

Create a file called .env and add your database API endpoint, access token and choose a name for your collection.

ASTRA_DB_API_ENDPOINT=
ASTRA_DB_APPLICATION_TOKEN=
ASTRA_DB_COLLECTION_NAME=

Creating a collection for hybrid search

Once the database is created, we'll need to create a collection to store our data in. We'll do this in code, because we want to create some settings that aren't yet available in the dashboard.

Create a file called create_collection.py and add this code:

import os
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
from dotenv import load_dotenv

load_dotenv()

client = DataAPIClient()
db = client.get_database(
    os.environ["ASTRA_DB_API_ENDPOINT"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
)

collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(1024)
    .set_vector_metric(VectorMetric.DOT_PRODUCT)
    .set_vector_service(
        provider="nvidia",
        model_name="NV-Embed-QA",
    )
    .set_lexical(
        {
            "tokenizer": {"name": "standard", "args": {}},
            "filters": [
                {"name": "lowercase"},
                {"name": "stop"},
                {"name": "porterstem"},
                {"name": "asciifolding"},
            ],
        }
    )
)

collection = db.create_collection(
    os.environ["ASTRA_DB_COLLECTION_NAME"],
    definition=collection_definition,
)

In this code we create a definition for our collection and then create the collection. The definition includes details on how we want the collection to create vectors for our data as well as how it should treat the keyword search.

For vector search, we are using Astra Vectorize with the built-in NVIDIA NeMo Retriever nv-embed-qa model to create vector embeddings on insert and search. The model creates vectors with 1024 dimensions, and we configure the collection to use the dot product to calculate similarity between vectors.

For the keyword search, the default performs exact keyword matching, but we can tweak this a bit with settings like this. First, we define the tokenizer, which is how the collection breaks up the text into words. We'll use the standard tokenizer, which divides based on word boundaries and strips out punctuation. We then add filters, which transform the text to make it easier to match searches. In this case, we add four filters:

lowercase - converts all the text to lowercase
stop - removes English stop words
porterstem - applies the Porter Stemming algorithm for English, which translates different forms of words to a common stem, e.g. "search", "searches", and "searched" will all translate to the token "search"
asciifolding - translates characters into ASCII, that is it turns accented characters into an ASCII equivalent if it exists, e.g. "café" becomes "cafe"

Note that both the stop and porterstem filters are specific to English texts.

You can choose to include the filters that will work best for your data. There is more on the available filters and links to further information in the Astra DB documentation.

Now we've created our collection, we can ingest some data to search against.

Indexing data for hybrid search

Save this list of made up restaurant descriptions that we'll use as our example data as a JSON file called restaurants.json. Create a new file called ingest.py and add the following code:

import os
from astrapy import DataAPIClient
from dotenv import load_dotenv
import json

load_dotenv()

client = DataAPIClient()
db = client.get_database(
    os.environ["ASTRA_DB_API_ENDPOINT"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
)
collection = db.get_collection(os.environ["ASTRA_DB_COLLECTION_NAME"])

with open("restaurants.json", "r") as file:
    restaurant_data = json.load(file)
    restaurants = [{"$hybrid": restaurant} for restaurant in restaurant_data]
    collection.insert_many(restaurants)

In this code we load the restaurant descriptions and then create each as a document in Astra DB passing in the description as the $hybrid property. Creating documents with the $hybrid property does two things.

It will use the NVIDIA NeMo Retriever embedding model that we configured when we created the collection to create vector embeddings of the content. This is the same as using Astra Vectorize to generate embeddings.

It will also index the text for the new BM25 keyword search.

Run the code with:

python ingest.py

Check your collection in the DataStax dashboard, you should find both $vectorize and $lexical properties.

Performing a hybrid search

Having indexed using $hybrid, we can now perform vector and hybrid searches against this collection. Create a file called search.py and enter the following code:

import os
from astrapy import DataAPIClient
from dotenv import load_dotenv

load_dotenv()

client = DataAPIClient()
db = client.get_database(
    os.environ["ASTRA_DB_API_ENDPOINT"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
)
collection = db.get_collection(os.environ["ASTRA_DB_COLLECTION_NAME"])

cursor = collection.find(
    sort={"$vectorize": "salads"},
    limit=5,
    projection={"_id": 0, "$vectorize": 1},
)

for document in cursor:
    print(document)

This will perform a vector search on the collection using Astra Vectorize when you run:

python search.py

When you run this search you will see five results. In position four is "The Green Leaf Eatery," which is the most "salads" sounding place on the list to me. Positions one and two do mention salads, and, because it is vector search and not keyword search, position three, "Fusion Flavors Bistro," doesn't mention salads at all.

Now, let's update the search to use Hybrid Search and perform reranking on the results. You will need to change the find method to the new find_and_rerank method and pass {"$hybrid": query} as your sort field. You can add other arguments too, like hybrid_limits, which sets the number of documents to retrieve from each inner query before reranking, and include_scores, which shows the various scores used to rank documents along the way.

cursor = collection.find_and_rerank(
    sort={"$hybrid": "salads"},
    limit=5,
    hybrid_limits=10,
    projection={"_id": 0, "$vectorize": 1},
    include_scores=True,
)

for result in cursor:
    print(result.document)
    print(result.scores)

Now you will see these results:

{'$vectorize': "The Green Leaf Eatery: A bright and airy vegetarian and vegan restaurant focusing on fresh, seasonal produce. Their innovative menu features creative plant-based dishes, from vibrant salads and grain bowls to hearty vegetable curries and decadent vegan desserts. It's a celebration of healthy and delicious eating."}
{'$rerank': -3.6972656, '$vector': 0.67285335, '$vectorRank': 4, '$bm25Rank': 2, '$rrf': 0.03175403}

{'$vectorize': 'The Bohemian Brew & Bites: A quirky and eclectic cafe offering a relaxed atmosphere and a diverse menu. Enjoy gourmet sandwiches on artisanal bread, creative salads with house-made dressings, and a selection of globally inspired small plates. Their extensive coffee and craft beer menu makes it the perfect spot for a casual bite or a leisurely hangout.'}
{'$rerank': -4.5507812, '$vector': 0.6813005, '$vectorRank': 2, '$bm25Rank': 3, '$rrf': 0.032002047}

{'$vectorize': 'The Olive Grove Mediterranean: Transport yourself to the sunny shores of the Mediterranean at this charming restaurant. Their menu features flavorful Greek and Turkish dishes, from grilled kebabs and savory spanakopita to creamy hummus and vibrant salads. Enjoy the fresh herbs, olive oil, and sun-drenched flavors.'}
{'$rerank': -5.1210938, '$vector': 0.68404347, '$vectorRank': 1, '$bm25Rank': 1, '$rrf': 0.032786883}

{'$vectorize': 'Fusion Flavors Bistro: A contemporary restaurant that creatively blends different culinary traditions. Expect unexpected and exciting flavor combinations, innovative presentations, and a menu that constantly evolves. This is a place for adventurous palates seeking a unique dining experience.'}
{'$rerank': -11.375, '$vector': 0.67336804, '$vectorRank': 3, '$bm25Rank': None, '$rrf': 0.015873017}

{'$vectorize': 'The Farmhouse Kitchen: A rustic and charming restaurant celebrating the bounty of the local farm. Their menu changes seasonally, featuring dishes made with the freshest ingredients sourced directly from nearby farms. Expect simple yet elegant preparations that highlight the natural flavors of the ingredients.'}
{'$rerank': -11.375, '$vector': 0.6582356, '$vectorRank': 7, '$bm25Rank': None, '$rrf': 0.014925373}

In this output, you can see the results and also the various scores that were used to rank them. You can see that "The Green Leaf Eatery" now ranks first on the list having been ranked in fourth by vector search and second by the BM25 search. The reranker lifted it up to first place.

There are other similar movements in the list, plus in the fifth position was a restaurant that was initially ranked seventh by the vector search and doesn't contain the search term "salads." Hybrid Search initially returns more results than we need, reranks them and then returns the most relevant, so this result was lifted up into a position to be returned. Positions four and five also received the same rerank score, so were placed in their order based on one more score that is calculated, reciprocal rank fusion (RRF). RRF isn't great for reranking, but is very quick, so is useful to help with tie-breaks here.

Try running vector and hybrid searches with other search terms to get a feel for the results. In our testing, we’ve seen Hybrid Search improve relevance by up to 45%.

Next, we'll take a look at a couple of other things you will need to consider when using Hybrid Search.

Providing your own vectors

The example above used Astra Vectorize to automatically create vector embeddings, but you can always use a different model and provide your own vectors.

If you do use your own vector embedding model, then you will need to provide both the vector and the text that will be indexed for keyword search. You can do this with the special property $lexical.

Imagine you have a method that creates a vector embedding called create_embedding. You might then ingest the data like this:

with open("restaurants.json", "r") as file:
    restaurant_data = json.load(file)
    restaurants = [
        {
            "$vector": create_embedding(restaurant),
            "$lexical": restaurant,
            "description": restaurant,
        }
        for restaurant in restaurant_data
    ]
    collection.insert_many(restaurants)

Now, when you perform a hybrid search, you need to provide a $vector with which to search. Also, the default property on which the content is reranked is $vectorize, so you need to tell the database which property to rerank on too.

You also need to set the query that you want to use to perform the reranking. It can be the same query that you use for the vector search and the keyword search, or something else. You can see more about using different searches below.

You can define the query with the rerank_query argument and the field on which to perform the reranking with the rerank_on argument. For example:

query = "salad"

cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {"$vector": create_embedding(query), "$lexical": query},
    },
    rerank_query=query,
    rerank_on="description",
    limit=5,
    hybrid_limits=10,
)

for result in cursor:
    print(result.document)
    print(result.scores)

Performing different searches

You can also use different terms to perform your initial searches. This is useful because BM25 keyword search acts as a filter on the query keywords.

In our Hybrid Search example above, only three restaurant descriptions mentioned "salads" so only three results had a $bm25Rank in the results.

That worked fine for our example, but when we're dealing with a RAG application, the search queries are often in natural language rather than keyword focused. We already set up our collection to use word stems and translate accented characters into ASCII. You may also want to perform keyword extraction, using something like NLTK, SpaCy or keyBERT, on the user query so you can then use the keywords for the lexical search. This would look like:

query = "I'm looking for a restaurant that serves the best salad"

cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": create_embedding(query),
            "$lexical": extract_keywords(query),
        },
    },
    rerank_query=query,
    rerank_on="description",
    limit=5,
    hybrid_limits=10,
)

for result in cursor:
    print(result.document)
    print(result.scores)

The above code will now perform the vector search with your own vector embedding model, keyword search using keywords extracted from the user query and then rerank the results based on the initial query.

Try hybrid search for better search and RAG relevancy

Combining vector search with keyword search and a reranking model like NVIDIA NeMo Retriever nvidia/llama-3.2-nv-rerankqa-1b-v2 produces more relevant results, improving the output of your RAG application. You can get started with hybrid search and reranking in Astra DB today by signing up and using AstraPy or with Langflow.

If you want to chat more about improving retrieval accuracy, drop into the DataStax Devs Discord or drop me an email at phil.nash@datastax.com.

Build a RAG Chat App with Firebase Genkit and Astra DB

Phil Nash — Wed, 16 Apr 2025 04:17:10 +0000

Today we announced the release of a plugin for Firebase's Genkit framework for building generative AI applications. Genkit is a powerful framework that provides the primitives for building production-quality GenAI applications. From easy access to models, prompts, indexers, and retrievers, to more advanced features like flows, traces, and evals, its power lies in making it easy to do the right thing while building GenAI applications.

In this post, we'll take a look at how to use the Astra DB plugin for Genkit to build a retrieval-augmented generation application with Genkit.

Building a RAG application

Let's build a RAG application from scratch and see how straightforward it can be with Genkit and Astra DB. First, you'll need a Gemini API key, which you can get from Google AI Studio.

You’ll also need an Astra DB database to store your data and vectors; if you don't already have an account you can sign up for a free DataStax account.

Start by creating a new Astra DB database; give it a name and choose a cloud and region. This takes a couple of minutes, so carry on with the next steps while it starts up.

Setting up the app

Create a directory for your app and install the dependencies you'll need:

mkdir genkit-astra-db-rag
cd genkit-astra-db-rag
npm init --yes
npm install genkit @genkit-ai/googleai genkitx-astra-db
npm install genkit-cli tsx -D

Create a file to work in:

touch index.ts

Open index.ts and import the dependencies you installed:

import { z, genkit, Document } from "genkit";
import { textEmbedding004, googleAI, gemini20Flash } from "@genkit-ai/googleai";
import {
  astraDBIndexerRef,
  astraDBRetrieverRef,
  astraDB,
} from "genkitx-astra-db";

In this case, we're pulling in Google's text-embedding-004 model for creating vector embeddings, and the Gemini Flash 2.0 model for generation.

It's about time to create a collection in which we can store our vectors. Hopefully your database has been created now, so head to the DataStax dashboard, choose your database, open the Data Explorer, and create a collection. Give the collection a name and choose "Bring my own" for the embedding generation method. The text-embedding-004 model creates vectors with 768 dimensions (though you can choose fewer), so enter 768 for the number of dimensions and choose "Cosine" for the similarity metric.

Once you've created the collection, you'll need the API endpoint of the database, the collection name and to generate an API token.

With those, create a .env file in your application and enter the credentials:

ASTRA_DB_API_ENDPOINT=""
ASTRA_DB_APPLICATION_TOKEN=""
ASTRA_DB_COLLECTION_NAME=""

Also in the .env file, enter your API key from AI Studio too:

GEMINI_API_KEY=""

Now we can configure Genkit. In index.ts create the ai object like so:

const collectionName = process.env.ASTRA_DB_COLLECTION_NAME!

const ai = genkit({
  plugins: [
    googleAI(),
    astraDB([
      {
        clientParams: {
          applicationToken: process.env.ASTRA_DB_APPLICATION_TOKEN!,
          apiEndpoint: process.env.ASTRA_DB_API_ENDPOINT!,
        },
        collectionName: collectionName,
        embedder: textEmbedding004,
      },
    ]),
  ],
});

This sets up Genkit with the Google AI plugin for models and embeddings and the Astra DB plugin, configured with the credentials to access the collection you just created and the vector embedding model text-embedding-004.

We can now access the Astra DB indexer and retriever via the reference functions:

export const astraDBIndexer = astraDBIndexerRef({ collectionName });
export const astraDBRetriever = astraDBRetrieverRef({ collectionName });

The indexer is used to store documents in the collection and the retriever is used to perform vector search to return documents from the collection.

Ingesting data

Now we can ingest some data into Astra DB. For this RAG application, let's grab data from the web. To ingest web data, we'll need to fetch it from a URL and then extract the main content from the returned HTML. I've written before about how I like to use Readability.js to parse out the content from a page, so we'll follow that. We'll also need something to turn the content into chunks, let's use llm-chunk for this as it's relatively simple.

Install the dependencies:

npm install @mozilla/readability jsdom llm-chunk

Import them at the top of the script:

import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";
import { chunk } from "llm-chunk";

Write a function that takes a URL, fetches the HTML content, extracts the content and returns it.

async function fetchTextFromWeb(url: string) {
  const html = await fetch(url).then((res) => res.text());
  const doc = new JSDOM(html, { url });
  const reader = new Readability(doc.window.document);
  const article = reader.parse();
  return article?.textContent || "";
}

The next thing to do is write our first Genkit flow to ingest data from a URL into the collection. Flows are functions that you can run via the Genkit UI or through code. Flows have strongly defined input and output schemas using zod.

For this flow we'll accept a string which is a URL. There's no need for an output as the function will just end when it completes successfully.

export const indexWebPage = ai.defineFlow(
  {
    name: "indexPage",
    inputSchema: z.string().url().describe("URL"),
    outputSchema: z.void(),
  },
  async (url: string) => {
    const text = await ai.run("extract-text", () => fetchTextFromWeb(url));

    const chunks = await ai.run("chunk-it", async () =>
      chunk(text, { minLength: 128, maxLength: 1024, overlap: 128 })
    );

    const documents = chunks.map((text) => {
      return Document.fromText(text, { url });
    });

    return await ai.index({
      indexer: astraDBIndexer,
      documents,
    });
  }
);

The ingestion pipeline is nice and easy to read as a flow. And using ai.run around the non-Genkit functions provides an extra level of tracing that we'll be able to see later.

The Genkit UI

This seems like a good time to test out what we've built so far. Open package.json and add a script to run your application code and one to start the Genkit server.

"scripts": {
  "start": "tsx --env-file .env ./index.ts",
  "genkit": "genkit start -- npm start"
},

Now you can run npm run genkit and open the Genkit UI in your browser at localhost:4000. You can either find your flow on the dashboard or by clicking on Flows in the sidebar and then selecting it from the list.

This gives you a box to add some input. The input is the schema that we set up as the parameters to the flow. In this case, it just expects a string that’s a URL.

Enter a URL and run the flow. Once it's complete, you can open the DataStax dashboard and see the chunks and their vectors stored in the collection.

Back in the Genkit UI you can click on View trace and you’ll be shown each of the steps the flow took to fetch, chunk, embed and store the data.

Head back to the Genkit dashboard and open Retrievers from the sidebar. All we did to define the available retriever was set up the Astra DB plugin and export the astraDBRetrieverRef.

We can already use that retriever from the Genkit UI. Click on the retriever and enter the following in the input:

{
    "content": [
        {
            "text": "some search term"
        }
    ],
    "metadata": {}
}

In the options, change the property k to 5. Run the retriever and it will perform a vector search using the text you provide in the input and returning five results from the database.

We can now hook this up with a full RAG flow, in which we first retrieve context from the database and then pass it to a model to generate a response. Open the code again and define another flow:

export const ragFlow = ai.defineFlow(
  { name: "rag", inputSchema: z.string(), outputSchema: z.string() },
  async (input: string) => {
    const docs = await ai.retrieve({
      retriever: astraDBRetriever,
      query: input,
      options: { k: 3 },
    });

    const { text } = await ai.generate({
      model: gemini20Flash,
      prompt: `
You are a helpful AI assistant that can answer questions.

Use only the context provided to answer the question.
If you don't know, do not make up an answer.

Question: ${input}`,
      docs,
    });

    return text;
  }
);

Here we use the retriever to search for the string input, and then pass the resulting documents as part of a prompt to the generate function that uses the Gemini Flash 2.0 model to perform the generation.

Restart the Genkit server, open up the Flows section and choose your RAG flow. You can now input a question, make sure it's relevant to the data you indexed, and Gemini will generate a relevant response based on the docs.

Once again, you can hit the View trace button to see what happened at each stage in this request.

We've only used these flows in the Genkit interface so far, but for either of the flows, you can run them like:

await indexWebPage.run("URL");

Genkit and Astra DB make RAG easy

It took us fewer than 100 lines of code to build the two major flows required for RAG: ingestion and generation. Firebase Genkit made it easy to test our implementation as we went—without us having to build a UI for it. And the tracing in Genkit means it's easier to track down bugs in your flows.

Astra DB is an easy to use and powerful vector database, and it's even easier to use when all you need to do is configure the plugin in Genkit and reference indexers and retrievers.

You can find the code for this app on GitHub. The Astra DB plugin for Genkit is open source so if you have any issues or requests, please open an issue on the GitHub repo. And check out the Genkit docs for more on what you can build with Genkit.

Frequently Asked Questions (FAQ)

What is Astra DB?

Astra DB is a cloud-based NoSQL document store. It features an accurate and performant vector index for storing vectors which can be used for similarity searches. It comes with a Genkit plugin for integration with the Firebase Genkit framework.

What is Genkit?

Genkit is a framework for building generative AI applications. It provides essential tools such as models, prompts, indexers, retrievers, flows, traces, and evaluations. By using Genkit, developers can efficiently create applications that leverage the power of generative AI.

How do I get started with building a RAG application using Genkit and Astra DB?

To build a RAG application with Genkit and Astra DB, you need to create a database and collection within Astra DB and then install Genkit and related dependencies into your Node.js application. Once you've configured Genkit with your Astra DB credentials, you can start creating flows.

What does building the RAG application involve?

Building the RAG application involves creating a collection in Astra DB to store vectors, and setting up flows in Genkit to ingest data, and to generate responses based on context retrieved from the database. You can test these flows out using the Genkit UI.

DataStax AI Platform:

The Fastest Way to Build and Deploy AI Apps

Try For Free

How to Create Vector Embeddings in Python

Phil Nash — Wed, 09 Apr 2025 01:26:09 +0000

When you’re building a retrieval-augmented generation (RAG) app, the first thing you need to do is prepare your data. You need to:

collect your unstructured data
split it into chunks
turn those chunks into vector embeddings
store the embeddings in a vector database

There are many ways that you can create vector embeddings in Python. In this post, we’ll take a look at four ways to generate vector embeddings: locally, via API, via a framework, and with Astra DB's Vectorize.

Local vector embeddings

There are many pre-trained embedding models available on Hugging Face that you can use to create vector embeddings. Sentence Transformers (SBERT) is a library that makes it easy to use these models for vector embedding, as well as cross-encoding for reranking. It even has tools for finetuning models, if that’s something that might be of use.

You can install the library with:

pip install sentence_transformers

A popular local model for vector embedding is all-MiniLM-L6-v2. It’s trained as a good all-rounder that produces a 384-dimension vector from a chunk of text.

To use it, import sentence_transformers and create a model using the identifier from Hugging Face, in this case "all-MiniLM-L6-v2". If you want to use a model that isn't in the sentence-transformers project, like the multilingual BGE-M3, you can use the organization to identify the model too, like, "BAAI/BGE-M3". Once you've loaded the model, use the encode method to create the vector embedding. The full code looks like this:

from sentence_transformers import SentenceTransformer


model = SentenceTransformer("all-MiniLM-L6-v2")
sentence = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
embedding = model.encode(sentence)

print(embedding)
# => [ 1.95171311e-03  1.51085425e-02  3.36140348e-03  2.48030387e-02 ... ]

If you pass an array of texts to the model, they’ll all be encoded:

from sentence_transformers import SentenceTransformer


model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = [
    "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
    "A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.",
    "A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.",
]
embeddings = model.encode(sentences)

print(embeddings)
# => [[ 0.00195174  0.01510859  0.00336139 ...  0.07971715  0.09885529  -0.01855042]
# [-0.04523939 -0.00046248  0.02036596 ...  0.08779042  0.04936493  -0.06218244]
# [-0.05453169  0.01125113 -0.00680178 ...  0.06443197  0.08771271  -0.00063468]]

There are many more models you can use to generate vector embeddings with the sentence-transformers library and, because you’re running locally, you can try them out to see which is most appropriate for your data. You do need to watch out for any restrictions that these models might have. For example, the all-MiniLM-L6-v2 model doesn’t produce good results for more than 128 tokens and can only handle a maximum of 256 tokens. BGE-M3, on the other hand, can encode up to 8,192 tokens. However, the BGE-M3 model is a couple of gigabytes in size and all-MiniLM-L6-v2 is under 100MB, so there are space and memory constraints to consider, too.

Local embedding models like this are useful when you’re experimenting on your laptop, or if you have hardware that PyTorch can use to speed up the encoding process. It’s a good way to get comfortable running different models and seeing how they interact with your data.

If you don't want to run your models locally, there are plenty of available APIs you can use to create embeddings for your documents.

APIs

There are several services that make embedding models available as APIs. These include LLM providers like OpenAI, Google, or Cohere, as well as specialist providers like Jina AI or model hosts like Fireworks.

These API providers provide HTTP APIs, often with a Python package to make it easy to call them. You will typically require an API key from the service. Once you have that setup you can generate vector embeddings by sending your text to the API.

For example, with Google's google-genai SDK and a Gemini API key you can generate a vector embedding with their experimental Gemini embedding model like this:

from google import genai


client = genai.Client(api_key="GEMINI_API_KEY")

result = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents="A robot may not injure a human being or, through inaction, allow a human being to come to harm.")

print(result.embeddings)

Each API can be different, though many providers do make OpenAI-compatible APIs. However, each time you try a new provider you might find you have a new API to learn. Unless, of course, you try one of the available frameworks that are intended to simplify this.

Frameworks

There are several projects available, like LangChain or LlamaIndex, that create abstractions over the common components of the GenAI ecosystem, including embeddings.

Both LangChain and LlamaIndex have methods for creating vector embeddings via APIs or local models, all with the same interface. For example, you can create the same Gemini embedding as the code snippet above with LangChain like this:

from langchain_google_genai import GoogleGenerativeAIEmbeddings


embeddings = GoogleGenerativeAIEmbeddings(
    model="gemini-embedding-exp-03-07",
    google_api_key="GEMINI_API_KEY"
)
result = embeddings.embed_query("A robot may not injure a human being or, through inaction, allow a human being to come to harm.")
print(result)

As a comparison, here is how you would generate an embedding using an OpenAI embeddings model and LangChain:

from langchain_openai import OpenAIEmbeddings


embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key="OPENAI_API_KEY"
)
result = embeddings.embed_query("A robot may not injure a human being or, through inaction, allow a human being to come to harm.")
print(result)

We had to change the name of the import and the API key we used, but otherwise the code is identical. This makes it easy to swap them out and experiment.

If you're using LangChain to build your entire RAG pipeline, these embeddings fit in well with the vector database interfaces. You can provide an embedding model to the database object and LangChain handles generating the embeddings as you insert documents or perform queries. For example, here's how you can combine the Google embeddings model with the LangChain wrapper for Astra DB.

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_astradb import AstraDBVectorStore


embeddings = GoogleGenerativeAIEmbeddings(
    model="gemini-embedding-exp-03-07",
    google_api_key="GEMINI_API_KEY"
)

vector_store = AstraDBVectorStore(
   collection_name="astra_vector_langchain",
   embedding=embeddings,
   api_endpoint="ASTRA_DB_API_ENDPOINT",
   token="ASTRA_DB_APPLICATION_TOKEN"
)

vector_store.add_documents(documents) # a list of document objects to store in the db

You can use the same vector_store object and associated embeddings to perform the vector search, too.

results = vector_store.similarity_search("Are robots allowed to protect themselves?")

LlamaIndex has a similar set of abstractions that enable you to combine different embedding models and vector stores. Check out this LlamaIndex introduction to RAG to learn more.

If you're new to embeddings, LangChain has a handy list of embedding models and providers that can help you find different options to try.

Directly in the database

The methods we’ve talked through so far have involved creating a vector independently of storing it in or using it to search against a vector database. When you want to store those vectors in a vector database like Astra DB, it looks a bit like this:

from astrapy import DataAPIClient


client = DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
database = client.get_database("ASTRA_DB_API_ENDPOINT")
collection = database.get_collection("COLLECTION_NAME")

result = collection.insert_one(
    {
         "text": "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
         "$vector": [0.04574034, 0.038084425, -0.00916391, ...]
    }
)

The above assumes that you have already created your vector-enabled collection with the right number of dimensions for the model you’re using.

Performing a vector search then looks like this:

cursor = collection.find(
    {},
    sort={"$vector": [0.04574034, 0.038084425, -0.00916391, ...]}
)

for document in cursor:
    print(document)

In these examples, you have to create your vectors first, before storing or searching against the database with them. In the case of the frameworks, you might not see this happen, as it has been abstracted away, but the operations are being performed.

With Astra DB, you can have the database generate the vector embeddings for you as you either insert the document into the collection or at the point of performing the search. This is called Astra Vectorize and it simplifies a crucial step in your RAG pipeline.

To use Vectorize, you first need to set up an embedding provider integration. There’s one built-in integration that you can use with no extra work; the NVIDIA NV-Embed-QA model, or you can choose one of the other embeddings providers and configure them with your API.

When you create a collection, you can choose which embedding provider you want to use with the requisite number of dimensions.

When you set up your collection this way you can add content and have it automatically vectorized by using the special property $vectorize.

result = collection.insert_one(
    {
         "$vectorize": "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
    }
)

Then, when a user query comes in, you can perform a vector search by sorting using the $vectorize property. Astra DB will create the vector embedding and then make the search in one step.

cursor = collection.find(
    {},
    sort={"$vectorize": "Are robots allowed to protect themselves?"},
    limit=5
)

There are several advantages to this approach:

The Astra DB team has done the work to make the embedding creation robust already
Making two separate API calls to create embeddings and then store them is often slower than letting Astra DB handle it
Using the built-in NVIDIA embeddings model is even quicker than that
You have less code to write and maintain

A world of vector embedding options

As we have seen, there are many choices you can make in how to implement vector embeddings, which model you use, and which provider you use. It's an important step in your RAG pipeline and it is important to spend the time to find out which model and method is right for your application and your data.

You can choose to host your own models, rely on third-party APIs, abstract the problem away through frameworks, or entrust Astra DB to create embeddings for you. Of course, if you want to avoid code entirely, then you can drag-and-drop your components into place with Langflow.

If you want to chat more about vector embeddings and RAG, drop into the DataStax Devs Discord or drop me an email at phil.nash@datastax.com.

Frequently asked questions

What are vector embeddings?

Vector embeddings are numerical representations of text in multi-dimensional space used for tasks like document retrieval and recommendation systems.

What steps are involved in creating vector embeddings for a retrieval-augmented generation (RAG) app?

To create vector embeddings, you need to:

Collect unstructured data
Split data into chunks
Turn chunks into vector embeddings
Store embeddings in a vector database

How can I create vector embeddings locally in Python?

You can create vector embeddings locally in Python using pre-trained embedding models from the HuggingFace, specifically using the sentence-transformers library.

What are some limitations of local embedding models?

Local embedding models handle a limited number of tokens effectively, and larger models require substantial memory and storage.

How can I create vector embeddings using an API?

You can create vector embeddings using APIs provided by services such as OpenAI, Google, and Cohere.

Are there frameworks to simplify embedding creation?

Yes, frameworks like LangChain and LlamaIndex offer standardized interfaces that abstract the complexities of embedding models and APIs.

What is Astra Vectorize, and how does it simplify the embedding process?

Astra Vectorize enables Astra DB to automatically generate vector embeddings as documents are inserted or queries are performed.

What are the advantages of using Astra Vectorize?

The advantages include simplified code maintenance, faster performance, improved efficiency, and robustness through pre-tested integrations.

How to Create Vector Embeddings in Node.js

Phil Nash — Thu, 03 Apr 2025 21:43:17 +0000

When you’re building a retrieval-augmented generation (RAG) app, job number one is preparing your data. You’ll need to take your unstructured data and split it up into chunks, turn those chunks into vector embeddings, and finally, store the embeddings in a vector database.

There are many ways that you can create vector embeddings in JavaScript. In this post, we’ll investigate four ways to generate vector embeddings in Node.js: locally, via API, via a framework, and with Astra DB's Vectorize.

Local vector embeddings

There are lots of open-source models available on HuggingFace that can be used to create vector embeddings. Transformers.js is a module that lets you use machine learning models in JavaScript, both in the browser and Node.js. It uses the ONNX runtime to achieve this; it works with models that have published ONNX weights, of which there are plenty. Some of those models we can use to create vector embeddings.

You can install the module with

npm install @xenova/transformers

The package can actually perform many tasks, but feature extraction is what you want for generating vector embeddings.

A popular, local model for vector embedding is all-MiniLM-L6-v2. It’s trained as a good all-rounder and produces a 384-dimension vector from a chunk of text.

To use it, import the pipeline function from Transformers.js and create an extractor that will perform "feature-extraction" using your provided model. You can then pass a chunk of text to the extractor and it will return a tensor object which you can turn into a plain JavaScript array of numbers.

All in all, it looks like this:

import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2"
);

const response = await extractor(
  ["A robot may not injure a human being or, through inaction, allow a human being to come to harm."],
  { pooling: "mean", normalize: true }
);

console.log(Array.from(response.data));
// => [-0.004044221248477697,  0.026746056973934174,   0.0071970801800489426, ... ]

You can actually embed multiple texts at a time if you pass an array to the extractor. Then you can call tolist on the response and that will return you a list of arrays as your vectors.

const response = await extractor(
  [
    "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
    "A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.",
    "A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.",
  ],
  { pooling: "mean", normalize: true }
);

console.log(response.tolist());
// [
//   [ -0.006129210349172354,  0.016346964985132217,   0.009711502119898796, ...],
//   [-0.053930871188640594,  -0.002175076398998499,   0.032391052693128586, ...],
//   [-0.05358131229877472,  0.021030642092227936, 0.0010665050940588117, ...]
// ]

There are many models you can use to create vector embeddings from text, and, because you’re running locally, you can try them out to see which works best for your data. You should pay attention to the length of text that these models can handle. For example, the all-MiniLM-L6-v2 model does not provide good results for more than 128 tokens and can handle a maximum of 256 tokens, so it’s useful for sentences or small paragraphs. If you have a bigger source of text data than that, you’ll need to split your data into appropriately sized chunks.

Local embedding models like this are useful if you’re experimenting on your own machine, or have the right hardware to run them efficiently when deployed. It's an easy way to get comfortable with different models and get a feel for how things work without having to sign up to a bunch of different API services.

Having said that, there are a lot of useful vector embedding models available as an API, so let's take a look at them next.

APIs

There is an abundance of services that provide embedding models as APIs. These include LLM providers, like OpenAI, Google or Cohere, as well as specialist providers like Voyage AI or Jina. Most providers have general purpose embedding models, but some provide models trained for specific datasets, like Voyage AI's finance, law and code optimised models.

These API providers provide HTTP APIs, often with an npm package to make it easy to call them. You’ll typically need an API key from the service and you can then generate embeddings by sending your text to the API.

For example, you can use Google's text embedding models through the Gemini API like this:

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "text-embedding-004"});
const text = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

const result = await model.embedContent(text);
console.log(result.embedding.values);
// => [0.04574034, 0.038084425, -0.00916391, ...]

Each API is different though, so while making a request to create embeddings is normally fairly straightforward, you’ll likely have to learn a new method for each API you want to call—unless of course, you try one of the available frameworks that are intended to simplify this.

Frameworks

There are many projects out there, like LangChain or LlamaIndex, that create abstractions over the various parts of the GenAI toolchain, including embeddings.

Both LangChain and LlamaIndex enable you to generate embeddings via APIs or local models, all with the same interface. For example, here’s how you can create the same embedding as above using the Gemini API and LangChain together:

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const embeddings = new GoogleGenerativeAIEmbeddings({
  apiKey: process.env.API_KEY,
  model: "text-embedding-004",
});
const text = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

const embedding = await embeddings.embedQuery(text);
console.log(embedding);
// => [0.04574034, 0.038084425, -0.00916391, ...]

To compare, this is what it looks like to use the OpenAI embeddings model through LangChain:

import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.API_KEY,
  model: "text-embedding-3-large",
});
const text = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

const embedding = await embeddings.embedQuery(text);
console.log(embedding);
// => [0.009445431, -0.0073068426, -0.00814802, ...]

Aside from changing the name of the import and sometimes the options, the embedding models all have a consistent interface to make it easier to swap them out.

If you’re using LangChain to create your entire pipeline, these embedding interfaces work very well alongside the vector database interfaces. You can provide an embedding model to the database integration and LangChain handles generating the embeddings as you insert documents or perform vector searches. For example, here is how to embed some documents using Google's embeddings and store them in Astra DB via LangChain:

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";
import { AstraDBVectorStore } from "@langchain/community/vectorstores/astradb";

const embeddings = new GoogleGenerativeAIEmbeddings({
  apiKey: process.env.API_KEY,
  model: "text-embedding-004",
});

const vectorStore = await AstraDBVectorStore.fromDocuments(
  documents, // a list of document objects to put in the store
  embeddings, // the embeddings model
  astraConfig, // config to connect to Astra DB
);

When you provide the embeddings model to the database object, you can then use it to perform vector searches too.

const results = vectorStore.similaritySearch("Are robots allowed to protect themselves?");

LlamaIndex allows for similar creation of embedding models and vector stores that use them. Check out the LlamaIndex documentation on RAG.

As a bonus, the lists of models that LangChain and LlamaIndex integrate are good examples of popular embedding models.

Directly in the database

So far, the methods above mostly involve creating a vector embedding independently of storing the embedding in a vector database. When you want to store those vectors in a vector database like Astra DB, it looks a bit like this:

import { DataAPIClient } from "@datastax/astra-db-ts";
const client = new DataAPIClient(process.env.ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(process.env.ASTRA_DB_API_ENDPOINT);
const collection = db.collection(process.env.ASTRA_DB_COLLECTION);

await collection.insertOne({
  text: "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
  $vector: [0.04574034, 0.038084425, -0.00916391, ...]
});

This assumes you have already created a vector enabled collection with the correct number of dimensions for the model you are using.

You can also search against the documents in your collection using a vector like this:

const cursor = collection.find({}, {
  sort: { $vector: [0.04574034, 0.038084425, -0.00916391, ...] },
  limit: 5,
});
const results = await cursor.toArray();

In this case, you have to create your vectors first, and then store or search against the database with them. Even in the case of the frameworks, that process happens, but it’s just abstracted away.

With Astra DB, you can have the database generate the embeddings for you as you’re inserting documents into a collection or as you perform a vector search against a collection.

This is called Astra DB vectorize; here's how it works.

First, set up an embedding provider integration. There is a built-in integration offering the NVIDIA NV-Embed-QA model, or you can choose one of the other providers and configure them with your own API key.

Then when you set up a collection, you can choose which embedding provider you want to use and set the correct number of dimensions.

Now, when you add a document to this collection, you can add the content using the special key $vectorize and a vector embedding will be created.

await collection.insertOne({
$vectorize: "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
});

When you want to perform a vector search against this collection, you can sort by the special $vectorize field and again, Astra DB will handle creating vector embeddings and then performing the search.

const cursor = collection.find({}, {
  sort: { $vectorize: "Are robots allowed to protect themselve?" },
  limit: 5,
});
const results = await cursor.toArray();

This has several advantages:

It's robust, as Astra DB handles the interaction with the embedding provider
It can be quicker than making two separate API calls to create embeddings and then store them
It's less code for you to write

Choose the method that works best for your application

There are many models, providers, and methods you can use to turn text into vector embeddings. Creating vector embeddings from your content is a vital part of the RAG pipeline and it does require some experimentation to get it right for your data.

You have the choice to host your own models, call on APIs, use a framework, or let Astra DB handle creating vector embeddings for you. And, if you want to avoid code altogether, you could choose to use Langflow's drag-and-drop interface to create your RAG pipeline

5 GenAI Things You Didn't Know About Astra DB

Phil Nash — Thu, 06 Mar 2025 23:07:23 +0000

Astra DB is a high-performance NoSQL database powered by Apache Cassandra® with built-in vector search, but that's just what the product page says. Not everything fits onto one page, so I wanted to share a few things that you might not already know about Astra DB and how it helps you to build accurate, low-latency, retrieval-augmented generation (RAG) powered generative AI apps.

Astra DB can create vector embeddings for you

When ingesting data for a RAG application, there are several steps you need to take: document loading, text parsing, chunking text, creating vector embeddings, and storing it in the database. Astra DB can simplify the process by combining those last two steps.

Astra Vectorize can create vector embeddings for your text chunks at the point of inserting them into the collection.

When you create an Astra DB collection, you can choose one of the supported embedding models. There are models available from OpenAI (including Azure OpenAI), Voyage AI, Mistral AI, Jina AI, and Upstage. Astra DB also hosts NVIDIA embedding models that run in the same environment as the database, boosting performance—Wikidata reduced their data ingestion time from 30 days to two with Vectorize—and ensuring the data never leaves the database.

Once you have set up your collection with your embedding provider of choice, ingesting data with Vectorize is a case of providing the text you want turned into a vector as a special $vectorize property in the documents you are storing. In TypeScript, this looks like:

import { DataAPIClient } from "@datastax/astra-db-ts";
const client = new DataAPIClient(process.env.ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(process.env.ASTRA_DB_API_ENDPOINT);
const collection = db.collection(process.env.ASTRA_DB_COLLECTION);

await collection.insertOne({
  $vectorize: "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
});

Then to perform a vector search against this collection you use the $vectorize field to sort by your query.

const cursor = collection.find({}, {
  sort: { $vectorize: "Are robots allowed to protect themselves?" },
  limit: 5,
});
const results = await cursor.toArray();

You can learn more about Astra Vectorize in the documentation.

Astra DB supports graph RAG

Depending on your data, regular vector search can sometimes miss context, which makes it harder for large language models (LLMs) to answer certain queries. Graph RAG is a technique that takes your documents, extracts links between them, and uses those links to retrieve extra contextual information at the retrieval stage. Providing extra linked context to an LLM makes for more accurate and informed answers.

Astra DB supports graph RAG via LangChain. You can replace the AstraDBVectorStore with AstraDBGraphVectorStore and ensure you ingest your data in a way that extracts the links between documents. A simplified ingestion example that reads a URL, extracts HTML links, strips the HTML, and splits the text into chunks before storing in Astra DB (using Astra Vectorize to create embeddings) might look like this:

import os

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.graph_vectorstores.extractors import (
    HtmlLinkExtractor,
    LinkExtractorTransformer
)
from langchain_community.document_transformers import BeautifulSoupTransformer
from langchain_astradb import AstraDBGraphVectorStore, CollectionVectorServiceOptions

vectorize_options = CollectionVectorServiceOptions(
    provider="nvidia",
    model_name="NV-Embed-QA",
)

vector_store = AstraDBGraphVectorStore(
    collection_name="graph",
    token=os.environ.get("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.environ.get("ASTRA_DB_API_ENDPOINT"),
    collection_vector_service_options=vectorize_options
)

urls = [
    "https://www.datastax.com/guides/graph-rag",
    "https://www.datastax.com/blog/build-graph-rag-with-unstructured-and-astra-db"
]
loader = AsyncHtmlLoader(urls)
docs = loader.load()

transformer = LinkExtractorTransformer([HtmlLinkExtractor().as_document_extractor()])
bs4_transformer = BeautifulSoupTransformer()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

docs = transformer.transform_documents(docs)
docs = bs4_transformer.transform_documents(docs)
chunks = text_splitter.split_documents(docs)

vector_store.add_documents(chunks)

Then to search Astra DB, you can use the graph store's traversal_search method to first retrieve a number of document chunks (k), before traversing the graph to the specified depth for additional chunks. In this case, we perform the search initially finding four chunks using a similarity search and then traversing the graph to a depth of two to return related chunks.

traversal_results = vector_store.traversal_search(
    query="What are the differences between Graph RAG and naive RAG?",
    k=4,
    depth=2,
)

Check out this full tutorial on building graph RAG with Unstructured and Astra DB.

Astra DB supports ColBERT

Graph RAG can help if your context is spread across chunks, but there are other situations where graph RAG won't necessarily help. If your data contains terms that aren't in the training data of your embedding model, it can be difficult to get accurate similarity search results.

One way to overcome this is to use ColBERT. ColBERT creates a vector per token in a body of text, creating a sliding window of context over entire passages and capturing unknown context much better. This does require more storage for the extra vectors, but if accuracy is your priority, it’s worthwhile.

You can use ColBERT with Astra DB in LangChain by using the RAGStack implementation.

To ingest the data, you can use the ColbertEmbeddingModel and ColbertVectorStore.

import os
from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore

embedding = ColbertEmbeddingModel()
database = CassandraDatabase.from_astra(
  astra_token=os.environ.get("ASTRA_DB_APPLICATION_TOKEN"),
  database_id=os.environ.get("ASTRA_DB_DATABASE_ID"),
  keyspace="default_keyspace"
)
vector_store = ColbertVectorStore(
  database=database,
  embedding_model=embedding
)
results = vector_store.add_texts(texts=YOUR_LIST_OF_TEXTS, doc_id="myDocs")

Then performing a similarity search is pretty much the same as any other vector store search in LangChain.

from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel
from ragstack_langchain.colbert import ColbertVectorStore as LangchainColbertVectorStore

colbert_embedding = ColbertEmbeddingModel()
colbert_database = CassandraDatabase.from_astra(
    astra_token=YOUR_ASTRA_DB_TOKEN,
    database_id=YOUR_ASTRA_DB_ID,
    keyspace="default_keyspace"
)
vector_store = LangchainColbertVectorStore(
    database=colbert_database,
    embedding_model=colbert_embedding
)
query = "What is ColBERT?"
results = vector_store.similarity_search(query)

Check out this full tutorial on using ColBERT with Astra DB, or for a faster alternative, Jonathan Ellis's ColBERT Live!, which uses Answer AI's colbert-small-v1 model and is supported by Astra DB.

Astra DB indexes your vectors live

Your vector database needs to be both accurate and speedy in order to ensure the performance of your application. When you are ingesting or updating data in your collection, rebuilding the index takes time and leaves you with slow queries or out of date data.

Astra DB's vector indexing capabilities are a combination of Cassandra's storage-attached indexing (SAI) and JVector, a non-blocking, concurrent, graph-based vector index. What this means is that Astra DB doesn't need to rebuild or block access to its index when you are inserting vectors, they are updated live.

The upshot of this is high throughput and accuracy even under mixed loads of reads and writes. Check out this benchmark of throughput and accuracy against Pinecone, particularly when Pinecone is performing indexing. Astra DB doesn't sacrifice throughput or accuracy under load; it will always be there for your application.

Astra DB is integrated in all your favourite frameworks

We've seen so far in this post that Astra DB is available in LangChain, but you can also find it in:

LangChain.JS
LlamaIndex and LlamaIndex.TS
Haystack
Mastra (a newer framework, built by the team behind Gatsby)

And of course Astra DB is integrated into Langflow. Deeply integrated! Once you enter your application token into the Astra DB component, your databases will automatically load. Then once you select your database, you can pick the collection you need too.

You can even create a new database from within Langflow. Oh, and Langflow supports using Astra Vectorize when ingesting or performing vector search too.

Langflow is a great visual way to build agents, and Astra DB makes it easy to build RAG or agentic RAG within Langflow.

Astra DB is ready to help you build transformative AI

Whether you're looking to build with Langflow or any number of other frameworks, or try out alternative vector searches like graph RAG or ColBERT, Astra DB is there to help. And it will do it quickly, creating vectors for you via Vectorize and indexing them live so your data is always up to date.

There are so many different applications you can build; check out examples like this AI resume assistant, RAG-powered voice agent, or hum-to-search music recognition app, all powered by Astra DB.

From chat bots to autonomous agents, Astra DB supports you in building the GenAI apps that are going to transform your business.

How to Stream Responses from the Langflow API in Node.js

Phil Nash — Wed, 05 Mar 2025 21:34:53 +0000

Building flows and AI agents in Langflow is one of the fastest ways to experiment with generative AI. Once you've built your flow, you’ll want to integrate it into your own application. Langflow exposes an API for this; we’ve written before about how to use it in Node.js. We've also seen that streaming GenAI outputs makes for a better user experience. So today, we're going to combine the two and show you how to stream results from your Langflow flows in Node.js.

Using the Langflow client

The easiest way to use the Langflow API is with the @datastax/langflow-client npm module. You can get started with the client by installing the module with npm:

npm install @datastax/langflow-client

The Langflow client can be used with both self-hosted and DataStax-hosted Langflow. You can see in-depth examples of how to set it up for either version of Langflow in this blog post. But the quick version is that for either type of Langflow, you start by importing the client:

import { LangflowClient } from "@datastax/langflow-client";

For self-hosted Langflow you need the URL where you’re hosting Langflow and, if you've set up user authorisation, an API key. You then initialise the client with both:

const baseURL = "http://localhost:7860";
const apiKey = "YOUR_API_KEY";
const client = new LangflowClient({ baseURL, apiKey });

For DataStax-hosted Langflow, you need your Langflow ID and to generate an API key. Then you create a client with the following code:

const langflowId = "YOUR_LANGFLOW_ID";
const apiKey = "YOUR_API_KEY";
const client = new LangflowClient({ langflowId, apiKey });

Streaming with the Langflow client

To stream through the API, you need a flow that’s set up for streaming responses. A streaming flow needs a model with streaming capabilities and the stream flag turned on, connected to a chat output. The basic prompting example, with streaming turned on, is a good example of this.

If you don't already have a flow, you can use the basic prompting flow as an example.

Once you have your flow in place, open the API modal and get the flow ID.

With the flow ID and the Langflow client, you can create a flow object:

const flowId = "YOUR_FLOW_ID";
const flow = client.flow(flowId);

To stream a response from the flow, you can use the [stream function](https://www.npmjs.com/package/@datastax/langflow-client#streaming). The response is a ReadableStream that you can iterate asynchronously over.

const response = await flow.stream("Hello, how are you?");
for await (const event of response) {
  console.log(event);
}

There are three types of event that the stream emits; this is what each of them means:

add_message: a message has been added to the chat. It can refer to a human input message or a response from an AI.
token: a token has been emitted as part of a message being generated by the model.
end: all tokens have been returned; this message will also contain the same full response that you get from a non-streaming request

If you want to log out just the text from a flow response you can do the following:

const response = await flow.stream("Hello, how are you?");
for await (const event of response) {
  if (event.event === "token") {
    console.log(event.data.chunk);
  }
}

The stream function takes all the same arguments as the run function, so you can provide tweaks for your components, too.

Integrating with Express

If you want to make an API request from an Express server and then stream it to your own front-end, you can do the following:

app.get("/stream", async (_req, res) => {
  res.set("Content-Type", "text/plain");
  res.set("Transfer-Encoding", "chunked");

  const response = await flow.stream("Hello, how are you?");

  for await (const event of response) {
    if (event.event === "token") {
      res.write(event.data.chunk);
    }
  }

  res.end();
});

We explored how you can handle a stream on the front-end in this blog post.

Stream your flows

Langflow enables you to rapidly build, experiment with, and deploy GenAI applications and with the JavaScript Langflow client you can easily stream those responses in your JavaScript applications.

Please do try out the Langflow client; if you have any issues, please raise them on the GitHub repo. If you're looking for more inspiration for building AI agents with Langflow, check out these posts that cover how to build an agent that can manage your calendar with Langflow and Composio or see how you can build local agents with Langflow and Ollama.

Build a RAG-Powered Voice Agent with Twilio Voice, OpenAI, Astra DB, and Node.js

Phil Nash — Wed, 19 Feb 2025 23:08:48 +0000

With the OpenAI Realtime API, you can build speech-to-speech applications that let you interact directly with a generative AI model by speaking with it. Talking directly to a model feels really natural, and the Realtime API makes it possible to build experiences like this into your own applications and businesses.

One example of this was built by Twilio: it enables you to connect a phone call to GPT-4o with Node.js (or, if you prefer, Python). The example is great, but it only shows connecting to a plain GPT-4o with a system prompt that encourages owl facts and jokes. Much as I like owl facts, I wanted to see what else we could achieve with a voice agent like this.

In this post, we'll show you how to extend the original assistant into an agent that can choose to use tools to augment its response. We'll give it additional, up-to-date knowledge via retrieval-augmented generation (RAG) using Astra DB.

Want to try it out before we dive into the details? Call (855) 687-9438 (that's 855-6-TSWIFT) and have a chat!

Prerequisites

First, you’ll need to set up the application from the Twilio blog post, so you'll need a Twilio account and an OpenAI API key. Make sure you can make a call and chat with the bot successfully.

You will also need a free DataStax account so you can set up RAG with Astra DB.

What we’re going to build

We already have a voice-capable bot that you can speak to over the phone. We're going to gather some up-to-date data and store it in Astra DB to help the bot answer questions.

The OpenAI Realtime API enables you to define tools that the model can use to execute functions and extend its capabilities. We’ll give the model a tool that enables it to search the database for additional information (this is an example of agentic RAG).

Ingesting data

To test out this agent, we're going to write a quick script to load and parse a web page, turn the content into chunks, turn those chunks into vector embeddings, and store them in Astra DB.

Create your database

To kick this process off, you'll need to create a database. Log into your DataStax account and, on the Astra DB dashboard, click Create a Database. Choose a Serverless (Vector) database, give it a name, and pick a provider and region. That will take a couple of minutes to provision. While it's doing that, have a think about some good web pages you might want to ingest into this database.

Once the database is ready, click on the Data Explorer tab and then the Create Collection + button. Give your collection a name, ensure it is a vector-enabled collection and choose NVIDIA as the embedding generation method. This will automatically generate vector embeddings for the content we insert into the collection.

Connect to the database

Open the application code in your favourite text editor. To get the application running, you’ll have created a .env file and populated it with your OpenAI API key (and if you didn't do that yet, now is definitely the time). Open that .env file and add some more environment variables.

ASTRA_DB_APPLICATION_TOKEN=
ASTRA_DB_API_ENDPOINT=
ASTRA_DB_COLLECTION_NAME=

Fill in the variables with the information from your database. You can find the API endpoint and generate an application token from the database overview in the Astra DB dashboard. Enter the name of the collection you just created, too.

Now we can connect to the database in the application. Install the Astra DB client from npm.

npm install @datastax/astra-db-ts

Create a new file in the application called db.js. Open the file and enter the following code:

import { DataAPIClient } from "@datastax/astra-db-ts";
import dotenv from "dotenv";

dotenv.config();

const {
  ASTRA_DB_APPLICATION_TOKEN,
  ASTRA_DB_API_ENDPOINT,
  ASTRA_DB_COLLECTION_NAME,
} = process.env;

const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);
export const collection = db.collection(ASTRA_DB_COLLECTION_NAME);

This code loads the client from the Astra DB module and the variables in the .env file into the environment. It then uses those environment variables as credentials to connect to the collection, and exports the collection object to be used elsewhere in the application.

Get some data

Now let's create a script that loads and parses a web page, then splits it into chunks and stores it in Astra DB. This script is going to combine some of the techniques in blog posts about scraping web pages, chunking text, and creating vector embeddings. To read more in depth about those, check out those posts.

Install the dependencies:

npm install @langchain/textsplitters @mozilla/readability jsdom

Create a file called ingest.js and copy the following code:

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";

import { collection } from "./db.js";

import { parseArgs } from "node:util";

const { values } = parseArgs({
  args: process.argv.slice(2),
  options: { url: { type: "string", short: "u" } },
});

const { url } = values;
const html = await fetch(url).then((res) => res.text());

const doc = new JSDOM(html, { url });
const reader = new Readability(doc.window.document);
const article = reader.parse();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 100,
});

const docs = (await splitter.splitText(article.textContent)).map((chunk) => ({
  $vectorize: chunk,
}));

await collection.insertMany(docs);

This script:

uses the Node.js argument parser to get a URL from the command line arguments
loads the web page at that URL
parses the content from the page using Readability.js and JSDOM
splits the text into 500 character chunks with 100 character overlap using the RecursiveCharacterTextSplitter
turns the chunks into objects where the chunk of text becomes the $vectorize property
inserts all the documents into the collection

Using the $vectorize property tells Astra DB to automatically create vector embeddings for this content.

We can now run this file from the command line. For example, here's how to ingest the Wikipedia page on Taylor Swift:

node ingest.js --url https://en.wikipedia.org/wiki/Taylor_Swift

Once this command has been run, check the collection in the DataStax dashboard to see the contents and the vectors.

Build the voice agent

To turn our existing voice assistant into an agent that can choose to search the database for more information, we need to provide it with a tool, or function, that it can choose to use.

Create a new file called tools.js and open it in your editor. Start by importing collection from db.js:

import { collection } from "./db.js"

Next we need to create the function that the agent can use to search the database.

When the OpenAI agent provides parameters to call a function with, it does so as an object. So the function should receive an object, from which we can destruct to extract the query. We'll then use the query to perform a vector search against our collection.

We can use Astra DB Vectorize to automatically create a vector embedding of the query. We'll also limit the results to the top 10 and ensure we return the text from the chunks by selecting $vectorize in the projection.

Calling find on the collection with these arguments will return a cursor, which we can turn into an array by calling toArray. We then iterate over the array of documents, extracting just the text and then joining the resulting array with a newline to create a single string result that can be provided as context to the agent.

async function taylorSwiftFacts({ query }) {
  const docs = await collection.find(
{},
{ $vectorize: query, limit: 10, projection: { $vectorize: 1 } }
  );
  return (await docs.toArray()).map((doc) => doc.$vectorize).join("\\n");
}

I've called the function taylorSwiftFacts because that's what I loaded with my ingestion script; feel free to use a different name.

This is our first tool; we can write more, but for now we can just export this as an object of tools.

export const TOOLS = {
  taylorSwiftFacts,
};

To help the model choose when to use this tool, it needs a description of what it can do and the arguments it expects. For each tool you provide a type, name, description, and the parameters.

For our function the type will be "function" and the name is taylorSwiftFacts. The description will tell the agent that we have up-to-date information about Taylor Swift that it can search for. The parameters are a JSON schema description of the arguments your function expects, this tool is relatively simple as it only requires one parameter called query, which is a string. The full description looks like this:

export const DESCRIPTIONS = [
  {
    type: "function",
    name: "taylorSwiftFacts",
    description:
      "Search for up to date information about Taylor Swift from her wikipedia page",
    parameters: {
      type: "object",
      properties: {
        query: {
          type: "string",
          description: "The search query",
        },
      },
    },
  },
];

Our tool definition is complete for now, so let's add them to our agent.

Handling function calls in a voice agent

We've been building supporting functions around the existing application so far, but to connect our tool to the agent we need to dig into the main body of code. Open index.js in our editor and start by importing the tool we just defined:

import Fastify from 'fastify';
import WebSocket from 'ws';
import dotenv from 'dotenv';
import fastifyFormBody from '@fastify/formbody';
import fastifyWs from '@fastify/websocket';

import { DESCRIPTIONS, TOOLS } from "./tools.js";

We need to update the system prompt to more accurately describe what the agent is capable of with the tool available to it. Since we ingested the wikipedia page for Taylor Swift earlier, we can update it to behave like a Taylor Swift superfan.

Find the SYSTEM_MESSAGE constant and update with:

const SYSTEM_MESSAGE = "You are a helpful and bubbly AI assistant who loves Taylor Swift. You can use your knowledge about Taylor Swift to answer questions, but if you don't know the answer, you can search for relevant facts with your available tools.";

Next we need to provide the tool we have built to the agent. Find the initializeSession function, it defines a sessionUpdate object that includes all the details to initialize the agent. Add a tools property to the session object using the DESCRIPTIONS object we imported earlier:

          const sessionUpdate = {
                type: 'session.update',
                session: {
                    turn_detection: { type: 'server_vad' },
                    input_audio_format: 'g711_ulaw',
                    output_audio_format: 'g711_ulaw',
                    voice: VOICE,
                    instructions: SYSTEM_MESSAGE,
                    modalities: ["text", "audio"],
                    temperature: 0.8,
                    tools: DESCRIPTIONS
                }
            };

We can also provide tools on a request-by-request basis, but this agent will benefit from access to this tool in all its interactions.

Finally we need to handle the event when the model requests to use a tool. Find the event handler for when the connection to OpenAI receives a message, it looks like: openAiWs.on('message', … ).

Change the event handler to an async function:

openAiWs.on('message', async (data) => {

When the Realtime API wants to use a tool, it sends an event with the type "response.done." Within the event object there are outputs, and if one of the outputs has a type of "function_call" we know the model wants to use one of its tools.

The output provides the name of the function it wants to call and the arguments. We can look up the tool in our object of TOOLS that we imported, then call it with the arguments.

When we have the result of the function call we pass it back to the model so that it can choose what to do next. We do so by creating a new message with the type "conversation.item.create" and within that message we include an item with the type "function_call_output", the output of the function call, and the ID that the original event had, so that the model can tie the response to the original query.

We send this to the model as well as another message with the type "response.create" which requests the model use this new information to return a new response.

Overall, this enables the model to request to use the database search function we defined and provide the arguments it wants to call the function with. We are then responsible for calling the function and returning the results to the model. The whole code looks like this:

      openAiWs.on('message', async (data) => {
          try {
            const response = JSON.parse(data);

            if (LOG_EVENT_TYPES.includes(response.type)) {
              console.log(`Received event: ${response.type}`, response);
            }

            if (response.type === "response.done") {
              const outputs = response.response.output;
              const functionCall = outputs.find(
                (output) => output.type === "function_call"
              );
              if (functionCall && TOOLS[functionCall.name]) {
                const result = await TOOLS[functionCall.name](
                  JSON.parse(functionCall.arguments)
                );
                const conversationItemCreate = {
                  type: "conversation.item.create",
                  item: {
                    type: "function_call_output",
                    call_id: functionCall.call_id,
                    output: result,
                  },
                };
                openAiWs.send(JSON.stringify(conversationItemCreate));
                openAiWs.send(JSON.stringify({ type: "response.create" }));
              }
            }

            // other event handlers

Start the application and make sure it is connected to your Twilio number as described in the Twilio blog post. Now we can call and chat all things Taylor Swift.

If you want to try this out with my assistant, you can give it a call on (855) 687-9438.

This is now a new way to connect with the Taylor Swift bot we built a while back. So now you can chat with SwiftieGPT online or on the phone.

Give your voice assistants some agency

Real-time voice agents are very cool, but they have all the same drawbacks as a plain LLM. In this post we added agentic RAG capabilities to our voice agent and it was able to use up-to-date knowledge to answer our questions about Taylor Swift.

When you provide a voice agent with tools, like context from a vector database, the results are very impressive. The combination of Twilio, OpenAI, and Astra DB creates a very powerful agent.

You can find the code to this in my fork of the Twilio project. You don't have to stop here though; you can define and add further tools to the agent. Make sure you check out OpenAI's best practices for defining functions for your models.

If you're interested in building other agents, check out how to work with Langflow and Composio or the workshop and videos from the recent Hacking Agents event.

Are you excited about voice agents or agentic RAG? Come chat about it and what you're building in the DataStax Devs Discord.

Want to roll up your sleeves and build with OpenAI, Twilio, Cloudflare, Unstructured, and DataStax? Join us on Feb. 28 in San Francisco for the Hacking Agents Hackathon, an epic 24-hour hackathon where we'll be diving into what developers can build with the latest and greatest in AI tooling.

How to Use the Langflow API in Node.js

Phil Nash — Tue, 28 Jan 2025 17:27:05 +0000

Langflow is a fantastic low-code tool for building generative AI flows and agents. Once you've built your flow, it’s time to integrate it into your own application using the Langflow API.

In Node.js applications, you can construct and make calls directly to the API with fetch, the http module, or using your favourite HTTP client like axios or got. To make it easier, you can now use this JavaScript Langflow client. Let's take a look at how it works.

What you'll need

You can use the JavaScript Langflow client with either the open-source, self-hosted version of Langflow or the DataStax cloud-hosted version of Langflow.

Note: this Langflow client is for using on the server. The Langflow API uses API keys, which should not be exposed, so it isn’t suitable for using directly from the front-end.

To test the client out, you’ll either need to host your own version of Langflow or sign up for a free DataStax account and use the cloud-hosted version. Once you are set up with Langflow, make sure you have a flow to test the API out with. The basic prompting flow template is a good start, or, if you're looking for something with a bit more agency, check out the simple agent template. You'll need an OpenAI API key to run these flows, or you can change out the model provider if you want to. Make sure the flow works with a test in the playground; once it’s responding, you’re ready to make calls to the API.

Getting started with the JavaScript Langflow client

To demonstrate how to use the Langflow client, let's start a small TypeScript application. Create a new directory, change into it, and initialize a new Node.js project:

mkdir using-langflow-client
cd using-langflow-client
npm init --yes

Install the client using your favourite package manager:

npm install @datastax/langflow-client

Install some other tools that will help us write and run the application:

npm install tsx @types/node --save-dev

Create a new file called index.ts and open it in your editor of choice. Start by importing the client.

import { LangflowClient } from "@datastax/langflow-client"

Now you can initialize a client to use with the Langflow API. How you do this depends on whether you’re using DataStax-hosted Langflow or self-hosted Langflow.

Initializing for DataStax-hosted Langflow

When you use DataStax-hosted Langflow, you’ll need your Langflow ID and an API key. You can get both from the API modal that you can access from the Langflow canvas.

The Langflow ID is in the API URL and you can generate an API key, too.

You can then create a client with the following code:

const langflowId = "YOUR_LANGFLOW_ID";
const apiKey = "YOUR_API_KEY";
const client = new LangflowClient({ langflowId, apiKey });

Initializing for self-hosted Langflow

If you’re self-hosting Langflow, or just running it locally, you’ll need the URL from which you access Langflow.

If you have set up authentication for your instance of Langflow, you’ll need to create an API key for your user. If you haven't yet set up authentication for your instance of Langflow, you can omit the API key.

You can then initialize the client like this:

const baseURL = "http://localhost:7860";
const apiKey = "YOUR_API_KEY";
const client = new LangflowClient({ baseURL, apiKey });

Running a flow

No matter which way you initialized your client, you can now use it to run your flows. To do so, you’ll need the flow ID, which can be found in the API modal in the flow canvas.

You can get a reference to a flow by calling on the client like so:

const flowId = "YOUR_FLOW_ID";
const flow = client.flow(flowId);

You can run the flow by calling run and passing it the input to your flow:

const response = await flow.run("Hello, how are you?");
console.log(response.outputs);

If you run the application now, your flow will run and output your results.

npx tsx ./index.ts

Flow responses

Flows return a lot of data: everything you could want to know about how the flow ran. The most important part of the response is the output from the flow; the Langflow client tries to make this easy.

You can take the flow response from above and instead of logging the entire set of response outputs, you can call:

const response = await flow.run("Hello, how are you?");
console.log(response.chatOutputText());

The client will return the text from the first chat output component in the response.

If you need the session ID, or more detail from any of the outputs, you can access the full response from the FlowResponse object:

const response = await flow.run("Hello, how are you?");
console.log(response.sessionId);
console.log(response.outputs);

Options for running a flow

Using flow.run(input) will run your flow with several defaults. The input and output types will be set to chat and it’ll use the default session. If your flow requires different settings, you can update the parameters. For example, if you want to set the input and output types to text and pass a session ID, you can do the following:

import { InputTypes, OutputTypes } from
"@datastax/langflow-client/consts";

// set up flow as above

const response = await flow.run("Hello, how are you?", {
  input_type: InputTypes.TEXT,
  output_type: OutputTypes.TEXT,
  session_id: "USER_SESSION_ID",
});

Tweaks

Langflow is flexible enough to enable you to change the settings for any of the components in a flow. For example, you might have set up the flow to use the OpenAI model component using the gpt-4o-mini model, but you want to test the flow with gpt-4o. Instead of updating the flow itself, you can send a tweak by providing the ID of the component and the parameters you want to override.

The JavaScript Langflow client supports tweaks in a couple of ways.

You can add a tweak to a flow object, like so:

const flow = client.flow(flowId);
const tweakedFlow = flow.tweak("OpenAIModel-KqkTB", { model_name: "gpt-4o" });

This creates a new flow object, so if you call run on the original flow object it will use the original model and if you call run on the tweakedFlow object it will use gpt-4o.

You can also provide your tweaks as an object when you run the flow.

const tweaks = { "OpenAIModel-KqkTB": { "model_name": "gpt-4o" }};
const response = await flow.run("Hello, how are you?", { tweaks });

Let's make this better together

This is the first release of this Langflow client and we want it to be the easiest way for you to use Langflow in your JavaScript server-side applications. The code is open-source and available on GitHub.

If you have feedback, suggestions, or you want to contribute, please do so over on GitHub. And if you like the library, please leave a star on the GitHub repo.

Clean up HTML Content for Retrieval-Augmented Generation with Readability.js

Phil Nash — Tue, 21 Jan 2025 18:22:44 +0000

Scraping web pages is one way to fetch content for your retrieval-augmented generation (RAG) application. But parsing the content from a web page can be a pain.

Mozilla's open-source library Readability.js is a useful tool for extracting just the important parts of a web page. Let's look at how to use it as part of a data ingestion pipeline for a RAG application.

Retrieving unstructured data from a web page

Web pages are a source of unstructured data that we can use in RAG-based apps. But web pages are often full of content that is irrelevant; things like headers, sidebars, and footers. They contain useful context for someone browsing the site, but detract from the main subject of a page.

To get the best data for RAG, we need to remove irrelevant content. When you’re working within one site, you can use tools like cheerio to parse the HTML yourself based on your knowledge of the site's structure. But if you're scraping pages across different layouts and designs, you need a good way to return just the relevant content and avoid the rest.

Repurposing reader view

Most web browsers come with a reader view that strips out everything but the article title and content. Here is the difference between the browser and reader mode when applied to a blog post on my persona site.

Mozilla makes the underlying library for Firefox's reader mode available as a standalone open-source module: Readability.js. So we can use Readability.js in a data pipeline to strip irrelevant content and return high quality results from scraping a web page.

How to scrape data with Node.js and Readability.js

Let's take a look at an example of scraping the article content from my previous blog post on creating vector embeddings in Node.js. Here's some JavaScript you can use to retrieve the HTML for the page:

const html = await fetch(
  ""https://philna.sh/blog/2024/09/25/how-to-create-vector-embeddings-in-node-js/""
).then((res) => res.text());
console.log(html);

This includes all the HTML tags as well as the navigation, footer, share links, calls to action and other things you can find on most web sites.

To improve on this, you could install a module like cheerio and select only the important parts:

npm install cheerio

import * as cheerio from "cheerio";

const html = await fetch(
  "https://philna.sh/blog/2024/09/25/how-to-create-vector-embeddings-in-node-js/"
).then((res) => res.text());

const $ = cheerio.load(html);

console.log($("h1").text(), "\n");
console.log($("section#blog-content > div:first-child").text());

With this code you get the title and text of the article. As I said earlier, this is great if you know the structure of the HTML, but that won't always be the case.

Instead, install Readability.js and jsdom:

npm install @mozilla/readability jsdom

Readability.js normally runs in a browser environment and uses the live document rather than a string of HTML, so we need to include jsdom to provide that in Node.js. Now we can turn the HTML we already loaded into a document and pass it to Readability.js to parse out the content.

import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";

const url = "https://philna.sh/blog/2024/09/25/how-to-create-vector-embeddings-in-node-js/";
const html = await fetch(url).then((res) => res.text());

const doc = new JSDOM(html, { url });
const reader = new Readability(doc.window.document);
const article = reader.parse();

console.log(article);

When you inspect the article, you can see that it has parsed a number of things from the HTML.

There's the title, author, excerpt, publish time, and both the content and textContent. The textContent property is the plain text content of the article, ready for you to split into chunks, create vector embeddings, and ingest into a vector database. The content property is the original HTML, including links and images. This could be useful if you want to extract links or process the images somehow.

You might also want to see whether the document is likely to return good results. Reader view works well on articles, but is less useful for other types of content. You can do a quick check to see if the HTML is suitable for processing with Readability.js with the function isProbablyReaderable. If this function returns false you may want to parse the HTML in a different way, or even inspect the URL to see whether it has useful content for you.

const doc = new JSDOM(html, { url });
const reader = new Readability(doc.window.document);

if (isProbablyReaderable(doc.window.document)) {
  const article = reader.parse();
  console.log(article);
} else {
  // do something else
}

If the page fails this check, you might want to flag the URL to see whether it does include useful information for your RAG application, or whether it should be excluded.

Using Readability with LangChain.js

If you're using LangChain.js for your application, you can also use Readability.js to return the content from an HTML page. It fits nicely into your data ingestion pipelines, working with other LangChain components, like text chunkers and vector stores.

The following example uses LangChain.js to load the same page as above, return the relevant content from the page using the MozillaReadabilityTransformer, split the text into chunks using the RecursiveCharacterTextSplitter, create vector embeddings with OpenAI, and store the data in Astra DB.

You'll need to install the following dependencies:

npm install @langchain/core @langchain/community @langchain/openai @datastax/astra-db-ts @mozilla/readability jsdom

To run the example, you will need to create an Astra DB database and store the database's endpoint and application token in your environment as ASTRA_DB_APPLICATION_TOKEN and ASTRA_DB_API_ENDPOINT. You will also need an OpenAI API key stored in your environment as OPENAI_API_KEY.

Import the dependencies:

import { HTMLWebBaseLoader } from "@langchain/community/document_loaders/web/html";
import { MozillaReadabilityTransformer } from "@langchain/community/document_transformers/mozilla_readability";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { OpenAIEmbeddings } from "@langchain/openai";
import { AstraDBVectorStore } from "@langchain/community/vectorstores/astradb";

We use the HTMLWebBaseLoader to load the raw HTML from the URL we provide. The HTML is then passed through the MozillaReadabilityTransformer to extract the text, which is then split into chunks by the RecursiveCharacterTextSplitter. Finally, we create an embedding provider and an Astra DB vector store that will be used to turn the text chunks into vector embeddings and store them in the vector database.

const loader = new HTMLWebBaseLoader(
  "https://philna.sh/blog/2024/09/25/how-to-create-vector-embeddings-in-node-js/"
);
const transformer = new MozillaReadabilityTransformer();
const splitter = new RecursiveCharacterTextSplitter({
  maxCharacterCount: 1000,
  chunkOverlap: 200,
});
const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
const vectorStore = new AstraDBVectorStore(embeddings, {
  token: process.env.ASTRA_DB_APPLICATION_TOKEN,
  endpoint: process.env.ASTRA_DB_API_ENDPOINT,
  collection: "content",
  collectionOptions: {
    vector: {
      dimension: 1536,
      metric: "cosine",
    },
  },
});
await vectorStore.initialize();

The initialisation of all the components makes up most of the work. Once everything is set up, you can load, transform, split, embed and store the documents like this:

const docs = await loader.load();
const sequence = transformer.pipe(splitter);
const vectorizedDocs = await sequence.invoke(docs);
await vectorStore.addDocuments(vectorizedDocs);

More accurate data from web scraping with Readability.js

Readability.js is a battle-tested library powering Firefox's reader mode that we can use to scrape only relevant data from web pages. This cleans up web content and makes it much more useful for RAG.

As we've seen, you can do this directly with the library or using LangChain.js and the MozillaReadabilityTransformer.

Getting data from a web page is only the first step in your ingestion pipeline. From here you'll need to split your text into chunks, create vector embeddings, and store everything in Astra DB. Then you'll be ready to build your RAG-powered application.