Forem: Guillaume Laforge

Text classification with Gemini and LangChain4j

Guillaume Laforge — Thu, 11 Jul 2024 20:26:36 +0000

Generative AI has potential applications far beyond chatbots and Retrieval Augmented Generation. For example, a nice use case is: text classification.

I had the chance of meeting some customers and prospects who had the need for triaging incoming requests, or for labeling existing data. In the first case, a government entity was tasked with routing citizen requests to access undisclosed information to the right governmental service that could grant or reject that access. In the second case, a company needed to sort out tons of existing internal documents that were not properly organized, and they wanted to quickly start better structuring this trove of information, by labelling each of these docs into different categories.

In both situations, the task was a text classification one: to put each request or document in a distinct pile, so they could more easily be sorted out, organized, and treated more rapidly.

Before generative AI, text classification would be handled by data scientists who would craft and train dedicated machine learning models for that purpose. But it is now also possible to do the same with the help of large language models. That’s what I’d like to explore with you in this article today.

As usual, I’ll be using the Gemini model, and the LangChain4j framework for implementing illustrative examples in Java.

Text classification: putting a label on a document

Before diving into the code, let’s step back a short moment to clarify what text classification is about. When we classify documents, we put a label on them.

For example, in a bug tracker, we could automate adding labels on new tickets that say that the bug report is related to a certain component. So we would put the name of the component as the label for that new ticket.

For routing incoming document access requests, we could put the label of the service that must treat the request, etc.

Filtering is also a text classification problem: we can filter the content of emails to state whether they are spam or not. And we can also use LLMs to filter harmful content from users’ inputs, and even classify the category of harm (hateful speech, harrasment, etc.)

Zero-shot prompting: just ask the model!

What about just asking a large language model what it thinks the classification, or the label should be? And indeed, LLMs are often very smart and can figure out the correct classification, without being trained specifically for that purpose.

Let’s illustrate this with a very common type of text classification: sentiment analysis.

First, we can define an enum representing the various sentiments that can be recognized:

enum Sentiment {
 POSITIVE, NEUTRAL, NEGATIVE
}

We create a record which will hold the result of the sentiment analysis:

record SentimentClassification(
 Sentiment sentiment
) {}

We will also need an interface to represent the type-safe Java service that the developers integrating this LLM-backed solution will call to retrieve the sentiment of the text:

interface SentimentClassifier {
 SentimentClassification classify(String text);
}

Notice that it takes in input an unstructured String text, but in output, you’ll manipulate a strongly typed object, not just a mere string.

It’s time to prepare our Gemini model:

var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-pro")
 .responseMimeType("application/json")
 .responseSchema(Schema.newBuilder()
 .setType(Type.OBJECT)
 .putProperties("sentiment",
 Schema.newBuilder()
 .setType(Type.STRING)
 .addAllEnum(Stream.of(Sentiment.values())
 .map(Enum::name)
 .collect(Collectors.toList()))
 .build())
 .build())
 .build();

We’re taking advantage of the latest feature of Gemini and LangChain4j, which permits to specify that we want 100% valid JSON in output, and even better than this, we want the generated JSON output to comply with a JSON schema!

Now we create the sentiment analysis service:

SentimentClassifier sentimentClassifier =
 AiServices.create(SentimentClassifier.class, model);

And we call it to retrieve the sentiment of the text we want to analyze:

SentimentClassification classification =
 sentimentClassifier.classify("I am happy!");
System.out.println(classification.sentiment()); // POSITIVE

We didn’t even need to give Gemini examples, this is why it’s called zero-shot prompting. LLMs are usually smart enough to easily handle familiar classification tasks like sentiment analysis.

Few-shot prompting: when the model needs a little help

A more common approach with LLMs for text classification is few-shot prompting. As the name implies, it’s a prompting technique.

You give the model a task (classifying text), and you show it examples of classifications, with a clear input/output format, to force the LLM to reply with just the expected class.

ChatLanguageModel model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash-001")
 .maxOutputTokens(10)
 .maxRetries(3)
 .build();

PromptTemplate promptTemplate = PromptTemplate.from("""
 Analyze the sentiment of the text below.
 Respond only with one word to describe the sentiment.

 INPUT: This is fantastic news!
 OUTPUT: POSITIVE

 INPUT: Pi is roughly equal to 3.14
 OUTPUT: NEUTRAL

 INPUT: I hate disliked the pizza. Who'd put pineapple toppings?
 OUTPUT: NEGATIVE

 INPUT: {{text}}
 OUTPUT:
 """);

Prompt prompt = promptTemplate.apply(
 Map.of("text", "I love strawberries!"));

Response<AiMessage> response = model.generate(prompt.toUserMessage());

System.out.println(response.content().text()); // POSITIVE

In the above approach, we use LangChain4j’s PromptTemplate, with a placeholder value {{text}} that will contain the text to classify. We don’t use an enum value though, so we have to discriminate against a string in the end. But we could also apply the same schema response handling as in our previous zero-shot example.

Let’s rewrite this code a little bit differently, to fake a conversation with the model. The model will see an exchange between a user and itself, and will also follow the same syntax, and will reply with just one word: the sentiment. We’ll use system instructions, and alternating AI and user messages:

List<ChatMessage> fewShotPrompts = List.of(
 SystemMessage.from("""
 Analyze the sentiment of the text below.
 Respond only with one word to describe the sentiment.
 """),

 UserMessage.from("This is fantastic news!"),
 AiMessage.from("POSITIVE"),

 UserMessage.from("Pi is roughly equal to 3.14"),
 AiMessage.from("NEUTRAL"),

 UserMessage.from("I hate disliked the pizza. " +
 "Who'd put pineapple toppings?"),
 AiMessage.from("NEGATIVE"),

 UserMessage.from("I love strawberries!")
);

response = model.generate(fewShotPrompts);

System.out.println(response.content().text()); // POSITIVE

Same outcome, stawberries are yummy!

Text classification with embedding models

In the two previous sections, we took advantage of LLMs’ abilities to classify text on their own, based on their intrinsic knowledge, or with the help of a few examples. But there’s another way we can investigate: using embedding vectors to compare texts.

Embedding vectors are mathematical representations of words/sentences/paragraphs, in the form of a vector of floating point values. The way those vectors are calculated by embedding models makes those vector close to each other (in terms of distance) when they are semantically close. You can have a look at my recent articleintroducing vector embeddings.

LangChain4j provides a TextClassifier interface which allows to classify text, by comparing it to sets of other texts that belong to a same class. So we give a map of possible labels, associated with lists of texts that belong to that category.

In particular, there’s an EmbeddingModelTextClassifier that uses embedding models to compare the texts with the examples of each labels. We can even tweak its internal algorithm to say whether we prefer if a text should be closer to the average of all the examples, or if we prefer if it’s closer to one of the examples (by default, it’s half distance to the mean, and half distance to the closest example.)

So let’s have a look at this solution.

Instead of doing sentiment analysis, we’ll go with recipe classification: our goal will be to classify a recipe, to know if it’s an appetizer, a main course, or a dessert.

First, we need to define our labels, with an enum:

enum DishType {
 APPETIZER, MAIN, DESSERT
}

Because we don’t have a dataset of recipes, we’ll use Gemini to generate sample recipes, for each label. For that, we need to configure Gemini:

private static final VertexAiGeminiChatModel CHAT_MODEL =
 VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .build();

We’ll also configure an embedding model to calculate the vector embeddings:

private static final VertexAiEmbeddingModel EMBEDDING_MODEL =
 VertexAiEmbeddingModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .endpoint(ENDPOINT)
 .publisher("google")
 .modelName("text-embedding-004")
 .taskType(VertexAiEmbeddingModel.TaskType.CLASSIFICATION)
 .build();

Vertex AI’s embedding models are capable of handling various tasks, including:

classification ,
semantic similarity,
clustering,
question answering,
fact verification,
query or document retrieval.

Let’s create a method to generate a recipe for a particular type of dish:

private static String recipeOf(DishType type) {
 return CHAT_MODEL.generate(
 "Write a recipe for a %s dish"
 .formatted(type.name().toLowerCase()));
}

And we’ll collect 3 examples of recipes for each type of dish:

var examplesOfRecipes = Stream.of(DishType.values())
 .collect(
 Collectors.toMap(
 dishType -> dishType,
 dishType ->
 Stream.generate(() -> recipeOf(dishType))
 .limit(3)
 .toList()
 )
 );

That way, we have our dataset ready, and we’ll prepare a text classifier:

EmbeddingModelTextClassifier<DishType> recipeClassifier =
 new EmbeddingModelTextClassifier<>(EMBEDDING_MODEL,
 examplesOfRecipes);

It takes a little while to calculate the initial embedding vectors of all the samples, but now our classifier is ready! Let’s see if the following recipe is an appertizer, a main course, or a dessert:

List<DishType> classifiedDishes = recipeClassifier.classify("""
 **Classic Moist Chocolate Cake**

 This recipe delivers a rich, moist chocolate cake that's
 perfect for any occasion.

 Ingredients:
 * 1 ¾ cups all-purpose flour
 * 2 cups granulated sugar
 * ¾ cup unsweetened cocoa powder
 * 1 ½ teaspoons baking powder
 * 1 ½ teaspoons baking soda
 * 1 teaspoon salt
 * 2 large eggs
 * 1 cup milk
 * ½ cup vegetable oil
 * 2 teaspoons vanilla extract
 * 1 cup boiling water

 Instructions:
 * Preheat oven to 350°F (175°C). Grease and flour two 9-inch
 round cake pans.
 * Combine dry ingredients: In a large bowl, whisk together flour,
 sugar, cocoa powder, baking powder, baking soda, and salt.
 * Add wet ingredients: Beat in eggs, milk, oil, and vanilla until
 combined.
 * Stir in boiling water: Carefully stir in boiling water. The
 batter will be thin.
 * Bake: Pour batter evenly into prepared pans. Bake for 30-35
 minutes, or until a toothpick inserted into the center comes
 out clean.
 * Cool: Let cakes cool in pans for 10 minutes before transferring
 to a wire rack to cool completely.
 """);

System.out.println("This recipe is of type: " + classifiedDishes);
// This recipe is of type: [DESSERT]

And voilà, we used the full power of embedding models to calculate text similarity to classify our chocolate cake recipe as a dessert!

Conclusion

Large Language Models like Gemini are great at classifying text, thanks to their general knowledge of the world that they acquired during their training. But for more specialized use cases, we might need to guide the LLM to recognize labels, because the subject is very specific to our data. That’s when few-shot prompting or embedding model-based classification helps.

If we have lots of samples for each label, using a few-shot prompting approach means we’ll have to pass all those examples again and again in the context window of the LLM, which yields a high token count. So if you pay per tokens, it can become a bit expensive.

If we use the embedding model text classifier, it might take a while to compute all the embedding vectors, but we’ll do it only once, and then we can just calculate the vector embedding for the text to classify, so it’s just the tokens of the text to classify that is incurred. If we have lots of samples, the classifier needs to do quite a few vector / matrix computations to calculate the distance to the samples, but it’s usually quite fast (unless we really have hundreds or thousands of samples).

I hope this article showed you that Generative AI is useful beyond the usual chatbots and RAG use cases. It’s great at text classification as well. And LangChain4j and Gemini are well suited for that use case, and you learned how to implement different approaches to do text classification.

Latest Gemini features support in LangChain4j 0.32.0

Guillaume Laforge — Fri, 05 Jul 2024 09:53:30 +0000

LangChain4j 0.32.0 was released yesterday, including my pull requestwith the support for lots of new Gemini features:

JSON output mode , to force Gemini to reply using JSON, without any markup,
JSON schema , to control and constrain the JSON output to comply with a schema,
Response grounding with Google Search web results and with private data in Vertex AI datastores,
Easier debugging, thanks to new builder methods to log requests and responses ,
Function calling mode (none, automatic, or a subset of functions),
Safety settings to catch harmful prompts and responses.

Let’s explore those new features together, thanks to some code examples! And at the end of the article, if you make it through, you’ll also discover 2 extra bonus points.

JSON output mode

Creating LLM-powered applications means working with text, as this is what LLMs return. But to facilitate this integration between LLM responses and your code, the text format of choice is usually JSON, as it’s human-readable, and easy to parse programmatically.

However, LLMs are a bit chatty, and rather than sending you back a nice raw JSON document, instead, it replies with some extra sentence, and some markdown markup to wrap the piece of JSON.

Fortunately, Gemini 1.5 (Flash and Pro) allows you to specify the response MIME type. Currently, only application/json is supported, but other formats may come later.

To do that, when instantiating the Gemini model, use the responseMimeType() builder method:

var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .responseMimeType("application/json")
 .build();

String response = model.generate("Roll a dice");

System.out.println(response);

No sentence, no markdown markup, nothing, just pure JSON:

{"roll": 3}

We didn’t even need to say in the prompt we wanted to get a JSON response!

However, the JSON key of that document may vary from time to time, so you may still wish to be a bit more prescriptive in your prompt, and ask the model to return JSON explicitly, give it an example of the JSON output you expect, etc. That’s the usual prompting approach…

But now there’s more!

JSON Schema output

This is quite unique in the LLM ecosystem, as I believe it’s the only model out there that allows you to specify a JSON schema for constraining the JSON output. This works for Gemini 1.5 Pro only, not with Gemini 1.5 Flash.

Let’s have another look at our previous dice roll example, and let’s update it to specify a JSON schema for the output generation:

import static dev.langchain4j.model.vertexai.SchemaHelper.fromClass;
//...

record DiceRoll(int roll) {}

var model = VertexAiGeminiChatModel.builder()
 .project("genai-java-demos")
 .location("us-central1")
 .modelName("gemini-1.5-pro")
 .responseSchema(fromClass(DiceRoll.class))
 .build();

String response = model.generate("Roll a dice");

System.out.println(response);

The generated JSON document will always contain the roll key

{ "roll": 5 }

In this example, we used a convenience method called fromClass() that creates a JSON schema that corresponds to a Java type (here a Java record).

But there’s also another convenient method that lets us pass a JSON schema string, called fromJsonSchema():

var model = VertexAiGeminiChatModel.builder()
 .project("genai-java-demos")
 .location("us-central1")
 .modelName("gemini-1.5-pro")
 .responseSchema(fromJsonSchema("""
 {
 "type": "object",
 "properties": {
 "roll": {
 "type": "integer"
 }
 }
 }
 """))
 .build();

It’s also possible to construct a JSON schema programmatically:

var model = VertexAiGeminiChatModel.builder()
 .project("genai-java-demos")
 .location("us-central1")
 .modelName("gemini-1.5-pro")
 .responseSchema(Schema.newBuilder()
 .setType(Type.OBJECT)
 .putProperties("roll",
 Schema.newBuilder()
 .setType(Type.INTEGER)
 .build())
 .build())
 .build();

Now you always get consistent JSON outputs!

Response grounding with Google Search web results and Vertex AI datastores

Large Language Models are wonderful creative machines, but rather than benefiting from their high degree of creativity, we’d prefer having factual responses grounded on data and documents.

Gemini offers the ability to ground responses:

against Google Search web results,
against Vertex AI search datastores.

Use Google Search to ground responses

The training of an LLM ended at a certain date: its cut-off date. So it doesn’t know about news that happened after that date. But you can request Gemini to use Google Search to find more up-to-date information.

For example, if we ask Gemini about the current elections going on in France, it could reply with something like this:

There is no current national election happening in France right now.
The last major national election in France was the **Presidential
election in April and May 2022**, where Emmanuel Macron won a second
term.
There are, however, **local elections** happening regularly in
different regions of France.
To stay updated on French elections, you can check the website of
the **French Ministry of the Interior** or reputable news sources
like **The Guardian, BBC, CNN, or Le Monde**.

Now, let’s enable the use of Google Search web result with the useGoogleSearch(true) method:

var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .useGoogleSearch(true)
 .build();

String response = model.generate(
 "What is the current election going on in France?");

System.out.println(response);

The answer will be much different, and indeed factual and up-to-date:

France held the first round of a parliamentary election on July 4,
2024. The second round will be on July 7, 2024. The election is
significant because it could result in the first far-right government
in France since World War II. The National Rally, President Emmanuel
Macron’s centrist alliance, and the New Popular Front coalition are
the three major political blocs competing in the election. The
outcome of the election is highly uncertain, with the far-right
National Rally potentially gaining a parliamentary majority. If the
National Rally wins a majority, Macron would be expected to appoint
Jordan Bardella, the party's president, as prime minister.

There’s indeed a parliamentary election going on right now in France. Those elections were decided only a month ago, thus past the cut-of-date of the knowledge of the model.

For my French audience, don’t forget to go voting next Sunday!

Grounding with Vertex AI Search

The idea is that we want to ground responses on our own data. This is particularly important when the knowledge required is actually private information, like our internal docs, or our customers’ docs.

My colleague Mete wrote a greatarticle explaining how to setup grounding with private data. Below, I’ll assume that we created a Vertex AI search app with a datastore backed by a Google Cloud Storage bucket that contains a fictious document which is a car manual, about the Cymbel Starlight car model! I’m taking the same example as in Mete’s article.

This time, we specify the search location to point at the Vertex AI search datastore with vertexSearchDatastore():

var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .vertexSearchDatastore(String.format(
 "projects/%s/locations/%s/collections/%s/dataStores/%s",
 PROJECT_ID, "global", "default_collection",
 "cymbal-datastore_1720169982142")
 )
 .build();

String response = model.generate(
 "What is the cargo capacity of Cymbal Starlight?");

System.out.println(response);

It’s a fictious car that doesn’t exist, but it’s covered in that private document, and indeed, Gemini is now able to respond to that question:

The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.

What’s interesting as well is that the response returned by Gemini provides some context about the source document that helped it answer the user query (we’ll see in the next section how to enable logging requests and responses):

 grounding_metadata {
2: {
1: {
3: 66
}
2: 0x3f7deee0
}
5: {
2: {
1: "gs://genai-java-demos-documents/cymbal-starlight-2024.pdf"
2: "cymbal-starlight-2024"
}
}
6: {
1: {
3: 66
4: "The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet."
}
2: "\000"
3: {
257772: 63
}
}

However, to be honest, I’m not quite sure what the numbers exactly mean, but this metadata mentions that the PDF uploaded in cloud storage is the one that was used to shape the answer of the LLM, and gives an excerpt of the sentence that was found in the document.

Request and response logging

To better understand what’s going on under the hood, you can enable request and response logging. That way, you’re able to see exactly what is sent to Gemini, and what Gemini replies.

To enable logging, there are two methods we can use:

logRequests(true) to log the request sent to Gemini,
logResponse(true) to log the response received from Gemini.

Let’s see that in action:

var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .logRequests(true)
 .logResponses(true)
 .build();

String response = model.generate("Why is the sky blue?");

System.out.println(response);

Here’s what’s logged:

[main] DEBUG dev.langchain4j.model.vertexai.VertexAiGeminiChatModel -
GEMINI (gemini-1.5-flash) request: InstructionAndContent {
systemInstruction = null,
contents = [role: "user"
parts {
text: "Why is the sky blue?"
}
]
} tools: []
[main] DEBUG dev.langchain4j.model.vertexai.VertexAiGeminiChatModel -
GEMINI (gemini-1.5-flash) response: candidates {
content {
role: "model"
parts {
text: "The sky appears blue due to a phenomenon called
**Rayleigh scattering**. Here\'s a breakdown:\n\n* **Sunlight
is made up of all colors of the rainbow.** When sunlight enters
the Earth\'s atmosphere, it encounters tiny particles like
nitrogen and oxygen molecules.\n* **These particles scatter the
sunlight in all directions.** However, shorter wavelengths of
light, like blue and violet, scatter more strongly than longer
wavelengths, like red and orange.\n* **This preferential
scattering of shorter wavelengths is called Rayleigh
scattering.**
As a result, we see more blue light scattered throughout the sky,
making it appear blue.\n\n **Why is the sky not violet?** \n\nEven
though violet light scatters even more strongly than blue, our
eyes are more sensitive to blue light. This is why we perceive
the sky as blue rather than violet.\n\n**Other factors that
affect sky color: **\n\n*** Time of day:** The sky appears more
red or orange at sunrise and sunset because the sunlight has to
travel through more of the atmosphere, scattering away most of
the blue light.\n* **Clouds:** Clouds are made up of larger water
droplets or ice crystals, which scatter all wavelengths of light
equally. This is why clouds appear white.\n* **Pollution:**
Pollution particles can scatter light differently, sometimes
making the sky appear hazy or even reddish.\n\nLet me know if
you have any other questions about the sky! \n"
}
}
finish_reason: STOP
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: NEGLIGIBLE
probability_score: 0.054802597
severity: HARM_SEVERITY_NEGLIGIBLE
severity_score: 0.03314852
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
probability_score: 0.100348406
severity: HARM_SEVERITY_NEGLIGIBLE
severity_score: 0.06359858
}
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
probability_score: 0.10837755
severity: HARM_SEVERITY_NEGLIGIBLE
severity_score: 0.021491764
}
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: NEGLIGIBLE
probability_score: 0.10338596
severity: HARM_SEVERITY_NEGLIGIBLE
severity_score: 0.020410307
}
}
usage_metadata {
prompt_token_count: 6
candidates_token_count: 288
total_token_count: 294
}

Let me give you a bit more details about the logging. LangChain4j uses Slf4j by default for logging. Request & Response logging is logged at DEBUG level. So we have to configure our logger and/or logger façace accordingly.

In my test project for this article, I configured the following Maven dependencies for Slf4j and the Simple logger:

<dependency>
 <groupId>org.slf4j</groupId>
 <artifactId>slf4j-api</artifactId>
 <version>2.0.13</version>
</dependency>
<dependency>
 <groupId>org.slf4j</groupId>
 <artifactId>slf4j-simple</artifactId>
 <version>2.0.13</version>
</dependency>

I created a properties file to configure the loggers: src/main/resources/simplelogger.properties, which contains the following configuration:

org.slf4j.simpleLogger.defaultLogLevel=debug
org.slf4j.simpleLogger.log.io.grpc.netty.shaded=info

I set the default logging level to be debug. But there’s also Netty, the networking library used under the hood by the Gemini Java SDK, that logs at debug level. So I specified that the logging for this library should only be at info and above, otherwise the output is super chatty.

Function calling mode

So far, when using Gemini forfunction calling, the model would decide on its own if a function would be useful to call, and which function to call.

But Gemini introduces the ability tocontrol the function or tool choice.

There are 3 options:

AUTO — The familiar and default mode, where Gemini decides on its own if a function call is necessary and which one should be made,
ANY — Allows to specify a subset of functions from all those available, but also forces the model to pick up one of them (only supported by Gemini 1.5 Pro),
NONE — Even if tools are defined and available, prevents Gemini to use any of those tools.

Let’s have a look at this example:

var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-pro")
 .logRequests(true)
 .logResponses(true)
 .toolCallingMode(ToolCallingMode.ANY)
 .allowedFunctionNames(Arrays.asList("add"))
 .build();

ToolSpecification adder = ToolSpecification.builder()
 .description("adds two numbers")
 .name("add")
 .addParameter("a", JsonSchemaProperty.INTEGER)
 .addParameter("b", JsonSchemaProperty.INTEGER)
 .build();

UserMessage message = UserMessage.from("How much is 3 + 4?");
Response<AiMessage> answer = model.generate(asList(message), adder);

System.out.println(
 answer.content().toolExecutionRequests().getFirst());

We specify the ToolCallingMode.ANY mode, and we list the allowed function names of the functions that the model must pick in order to reply to the request (with the allowedFunctionNames() builder method).

We describe the tool that can be called. We create a message. And when calling generate(), we pass the tool specification corresponding to the function we want to be called.

The output will show that the model replied with the mandatory tool execution request:

ToolExecutionRequest { id = null, name = "add",
arguments = "{"a":3.0,"b":4.0}" }

Now it’s our turn to call the add function with the arguments. And then send back the function execution result back to Gemini.

Warning : Currently, it is not possible to use the ANY forced function calling mode when using LangChain4j’s AiServices class.

AiServices takes care of automatic function calling. But the process is a two-step request / response mechanism:

First, we ask the model the math question and pass the tool specification along.

The model replies with a ToolExecutionRequest.

Then AiServices makes the function call locally, and replies to the model with the function execution result. However, since the ANY calling mode is specified at the model level, the model still wants to reply with yet another tool execution request. Although at this point, the second call made to the model was just to pass the function execution result, not to request another tool execution.

So AiServices enters an infite loop as the model requests a function execution again and again, not taking into account the execution result that it received.

When using AiServices, it’s better to let Gemini operate under the default AUTO tool mode. So it knows when it needs to request a tool execution, or if just needs to handle the tool execution response.

If you want to use the ANY mode with allowedFunctionNames(), then don’t use AiServices, and handle the function calls on your own in your code, to avoid such infite loop situations.

Specify safety settings

In LLM-powered applications, where users can enter any kind of weird textual inputs, you may want to limit harmful content that may be ingested. To do so, you can specify some safety settings, for different categories of content, with different thresholds of acceptance:

import static dev.langchain4j.model.vertexai.HarmCategory.*;
import static dev.langchain4j.model.vertexai.SafetyThreshold.*;
//...
var model = VertexAiGeminiChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .safetySettings(Map.of(
 HARM_CATEGORY_DANGEROUS_CONTENT, BLOCK_LOW_AND_ABOVE,
 HARM_CATEGORY_SEXUALLY_EXPLICIT, BLOCK_MEDIUM_AND_ABOVE,
 HARM_CATEGORY_HARASSMENT, BLOCK_ONLY_HIGH,
 HARM_CATEGORY_HATE_SPEECH, BLOCK_MEDIUM_AND_ABOVE
 ))
 .build();

If you want to make your app safer for your end-users, and to avoid malicious or ill-disposed users, that’s the way to go!

Bonus point #1: Streaming responses with lambda functions

I’ll round up the review of Gemini-focused features with one little addition I contributed to the project: the ability to pass a lambda instead of a streaming content handler, when using a streaming model.

This is not Gemini-related, you can use it with any model!

More concretely, if you want to use Gemini or another model in streaming mode, to see the response being printed as it’s generated by the model, you would usually write the following code:

var model = VertexAiGeminiStreamingChatModel.builder()
 .project(PROJECT_ID)
 .location(LOCATION)
 .modelName("gemini-1.5-flash")
 .build();

model.generate("Why is the sky blue?", new StreamingResponseHandler<>() {
 @Override
 public void onNext(String aFewTokens) {
 System.out.print(aFewTokens);
 }

 @Override
 public void onError(Throwable throwable) {
 throw new RuntimeException(throwable);
 }
});

Using an anonymous inner class implementing the StreamingResponseHandler interface is quite verbose. Fortunately, I contributed a couple static methods you can import, to make the code a little bit more concise:

import static dev.langchain4j.model.LambdaStreamingResponseHandler.onNext;
import static dev.langchain4j.model.LambdaStreamingResponseHandler.onNextAndError;
//...

// onNext
model.generate("Why is the sky blue?",
 onNext(System.out::println));

// onNextAndError
model.generate("Why is the sky blue?",
 onNextAndError(
 System.out::println,
 ex -> { throw new RuntimeException(ex); }
));

Now you can stream your LLM output in a single instruction!

Bonus point #2: Generating stunning images with Imagen v3

A second bonus point in this new LangChain4j release is the fact that the Vertex AI Image model now supportsImagen v3 (Google DeepMind’s latest high-quality image generation model).

Warning: To use the Imagen model, you’ll still have to be allow-listed for now. You’ll need to fill this formto request access to the model.

There are a few new parameters that are available that you can take advantage of when generating pictures. Let’s have a look at the following image generation code:

var imagenModel = VertexAiImageModel.builder()
 .project(PROJECT)
 .location(LOCATION)
 .endpoint(ENDPOINT)
 .publisher("google")
 .modelName("imagen-3.0-generate-preview-0611")
 .aspectRatio(VertexAiImageModel.AspectRatio.LANDSCAPE)
 .mimeType(VertexAiImageModel.MimeType.JPEG)
 .compressionQuality(80)
 .watermark(true) // true by default with Imagen v3
 .withPersisting()
 .logRequests(true)
 .logResponses(true)
 .build();

String prompt = """
 An oil painting close-up, with heavy brush strokes full of
 paint, of two hands shaking together, a young one, and an
 old one conveying a sense of heartfelt thanks and connection
 between generations
 """;

Response<Image> imageResponse = imagenModel.generate(prompt);
System.out.println(imageResponse.content().url());

Let’s see the resulting picture?

In the code above, you certainly noticed the new builder methods:

aspectRatio() — not only square, but wide and narrow landscape and portrait modes are available,
mimeType() — in addition to PNG, you can request JPEG image generation,
comressionQuality() — when requesting JPEG, you can chose the level of compression for encoding the image,
watermark() — to have all your generated images be watermarked with SynthId,
logRequest() / logResponse() — to see what is exchanged with the model, in and out,
persistToCloudStorage() — to specify you want the image saved in a cloud storage bucket (not used in this example).

If you get a chance, and request access to Imagen v3, you’ll notice really great quality improvements compared to v2!

Conclusion

Lots of new Gemini related features in thisrelease of LangChain4j! I hope this article helped you learn about them, and will make you want to use them in your projects.

If you want to go hands-on with Gemini with LangChain4j, don’t forget to check out my self-paced codelab:Gemini codelabg for Java developers, using LangChain4j.

The power of embeddings: How numbers unlock the meaning of data

Guillaume Laforge — Tue, 02 Jul 2024 07:05:07 +0000

Prelude

As I’m focusing a lot on Generative AI, I’m curious about how things work under the hood, to better understand what I’m using in my gen-ai powered projects. A topic I’d like to focus on more is: vector embeddings , to explain more clearly what they are, how they are calculated, and what you can do with them.

A colleague of mine, André, was showing me a cool experimenthe’s been working on, to help people prepare an interview, with the help of an AI, to shape the structure of the resulting final article to write.

The idea is to provide: a topic, a target audience, and to describe the goals for the audience. Then, a large language model like Gemini prepares a list of questions (that you can update freely) on that topic. Next, it’s your turn to fill in the blanks, answer those questions, and then the LLM generates an article, with a plan following those key questions and your provided answers. I cheated a bit, and asked Gemini itself those questions, and honestly, I really liked how the resulting article came to be, and I wanted to share with you the outcome below.

It’s a great and simple introduction to vector embeddings! I like how AI can help organize information, shape the structure and the content for an article. I’m not advocating for letting AI write all your articles , far from that, but as an author, however, I like that it can help me avoid the blank page syndrome, avoid missing key elements in my dissertation, improve the quality of my written prose.

Generative AI, in its creative aspect, and as your assistant, can be super useful! Use it as a tool to help drive your creativity! But always use your critical sense to gauge the quality and factuality of the content.

Introduction: What are vector embeddings?

Imagine you have a vast library filled with books on every topic imaginable. Finding a specific book can be a daunting task, especially if you only know the general subject matter. Now imagine a magical system that can understand the meaning of each book and represent it as a unique code. This code, called a vector embedding, can then be used to quickly find the most relevant books based on your search query, even if you only have a vague idea of what you’re looking for.

This is the power of vector embeddings. They are essentially numerical representations of complex data, like text, images, or audio, that capture the underlying meaning and relationships within the data. These numerical codes, arranged as vectors, allow computers to process and compare data in a way that mimics human understanding.

From Text to Numbers: The Journey of Embedding Creation

Creating vector embeddings involves a multi-step process that transforms raw data into meaningful mathematical representations. The journey begins with data preprocessing , where the data is cleaned, normalized, and prepared for embedding generation. This might involve tasks like removing irrelevant information, standardizing data formats, and breaking text into individual words or subwords (tokenization).

Next comes the heart of the process: embedding generation. This step leverages various techniques and algorithms, such as Word2Vec, GloVe, BERT, and ResNet, to convert each data point into a high-dimensional vector. The specific algorithm chosen depends on the type of data being embedded (text, images, or audio) and the intended application.

For instance, Word2Vec uses a neural network to learn relationships between words by analyzing how they co-occur in large text corpora. This results in vector representations for words, where similar words have similar vectors, capturing semantic relationships. Similarly, for images, convolutional neural networks (CNNs) like ResNet can be used to extract features from images, resulting in vectors that represent the visual content.

Vector Databases: The Power of Storing and Searching Embeddings

Once embeddings are generated, they need a dedicated storage system for efficient retrieval and comparison. This is where vector databases come into play. Unlike traditional databases designed for structured data, vector databases are optimized for storing and searching high-dimensional vector data.

Vector databases employ specialized indexing techniques, such as Annoy, HNSW, and Faiss, to create efficient data structures that allow for fast similarity search. This means that when a user submits a query (e.g., a search term, an image), the database can quickly find the most similar data points based on the similarity of their vector representations.

Embeddings Empower Search: Finding the Needle in the Haystack

The combination of vector embeddings and vector databases revolutionizes search by enabling semantic search. This means that instead of relying solely on keyword matching, search engines can understand the meaning behind the data and find relevant results even if the query doesn’t use exact keywords.

For example, imagine searching for “a picture of a dog with a hat.” Traditional keyword-based search might struggle to find relevant images, as the search term might not match the image description. However, with vector embeddings, the search engine can understand the semantic meaning of the query and find images that contain both a dog and a hat, even if those words are not explicitly mentioned in the image description.

Beyond Search: Expanding the Reach of Embeddings

Vector embeddings are not limited to search applications. They have become essential tools in a wide range of fields, including:

Retrieval Augmented Generation (RAG): This technique combines the power of information retrieval and generative models to create more informative and relevant responses. Embeddings are used to find relevant information in large text corpora, which is then used to augment prompts for language models, resulting in more accurate and context-aware outputs.
Data Classification: Embeddings enable the classification of data points into different categories based on their similarity. This finds application in areas like sentiment analysis, spam detection, object recognition, and music genre classification.
Anomaly Detection: By representing data points as vectors, anomalies can be identified as data points that are significantly different from the majority. This technique is used in various fields, including network intrusion detection, fraud detection, and industrial sensor monitoring.

Facing the Challenges and Shaping the Future

While vector embeddings have revolutionized data analysis, they still face some challenges. These include the difficulty of capturing polysemy (multiple meanings of a word), contextual dependence, and the challenge of interpreting the meaning behind the high-dimensional vector representations.

Despite these limitations, research continues to push the boundaries of vector embeddings. Researchers are exploring techniques like contextual embeddings, multilingual embeddings, knowledge graph integration, and explainable embeddings to overcome existing limitations and unlock the full potential of these powerful representations.

Stepping into the World of Embeddings: Resources and Next Steps

For those interested in diving deeper into the world of vector embeddings, a wealth of resources is available. Online courses and tutorials on platforms like Coursera, Fast.ai, and Stanford’s online learning platform provide a solid foundation in the underlying concepts and techniques.

Books like “Speech and Language Processing” by Jurafsky and Martin and “Deep Learning” by Goodfellow, Bengio, and Courville offer in-depth coverage of the field. Additionally, research papers and articles on platforms like arXiv and Medium offer insights into the latest advancements and applications.

To gain practical experience, explore Python libraries like Gensim, spaCy, and TensorFlow/PyTorch. These libraries provide tools for creating and working with embeddings, allowing you to build your own models and experiment with various applications.

The world of vector embeddings is constantly evolving, offering exciting opportunities for innovation and discovery. By understanding the power of these representations, you can unlock new possibilities for data analysis, information retrieval, and artificial intelligence applications.

Functional builders in Java with Jilt

Guillaume Laforge — Mon, 17 Jun 2024 18:31:25 +0000

A few months ago, I shared an article about what I called Javafunctional builders, inspired by an equivalent pattern found in Go. The main idea was to have builders that looked like this example:

LanguageModel languageModel = new LanguageModel(
 name("cool-model"),
 project("my-project"),
 temperature(0.5),
 description("This is a generative model")
);

Compared to the more tranditional builder approach:

You’re using the new keyword again to construct instances.
There’s no more build() method, which felt a bit verbose.

Compared to using constructors with tons of parameters:

You have methods like in traditional builders, that say what each parameter is about (name(), temperature()…) a bit similar to named parameters in some programming languages.

The approach I followed was to take advantage of lambda functions under the hood:

public static ModelOption temperature(Float temperature) {
 return model -> model.temperature = temperature;
}

However, there were a few downsides:

Of course, it’s not very conventional! So it can be a bit disturbing for people used to classical builders.
I didn’t make the distinction between required and optional parameters (they were all optional!)
The internal fields were not final, and I felt they should be.

Discovering Jilt

When searching on this topic, I found Adam Ruka’s great annotation processor library:Jilt.

One of the really cool features of Jilt is its staged builder concept, which makes builders very type-safe, and forces you to call all the required property methods by chaining them. I found this approach very elegant.

Adam heard about my functional builder approach, and decided to implement this new style of builder in Jilt. There are a few differences with my implementation, but it palliates some of the downsides I mentioned.

Let’s have a look at what functional builders looks like from a usage standpoint:

LanguageModel languageModel = languageModel(
 name("cool-model"),
 project("my-project"),
 temperature(0.5),
 description("This is a generative model")
);

Compared to my approach, you’re not using constructors (as annotation processors can’t change existing classes), so you have to use a static method instead. But otherwise, inside that method call, you have the named-parameter-like methods you’re used to use in builders.

Here, name(), project() and temperature() are mandatory, and you’d get a compilation error if you forgot one of them. But description() is optional and can be ommitted.

Let’s now look at the implementation:

import org.jilt.Builder;
import org.jilt.BuilderStyle;
import org.jilt.Opt;

import static jilt.testing.LanguageModelBuilder.*;
import static jilt.testing.LanguageModelBuilder.Optional.description;
//...
LanguageModel languageModel = languageModel(
 name("cool-model"),
 project("my-project"),
 temperature(0.5),
 description("This is a generative model")
);
//...
@Builder(style = BuilderStyle.FUNCTIONAL)
public record LanguageModel(
 String name,
 String project,
 Double temperature,
 @Opt String description
) {}

I used a Java record but it could be a good old POJO. You must annotate that class with the @Builder annotation. The style parameter specifies that you want to use a functional builder. Notice the use of the @Opt annotation to say that a parameter is not required.

Derived instance creation

Let me close this article with another neat trick offered by Jilt, which is how to build other instances from existing ones:

@Builder(style = BuilderStyle.FUNCTIONAL, toBuilder = "derive")
public record LanguageModel(...) {}
//...
LanguageModel derivedModel = derive(languageModel, name("new-name"));

By adding the toBuilder = "derive" parameter to the annotation, you get the ability to create new instances similar to the original one, but you can change both required and optional parameters, to derive a new instance.

Time to try Jilt!

You can try functional builders in Jilt 1.6 which was just released a few days ago!

Let's make Gemini Groovy!

Guillaume Laforge — Mon, 03 Jun 2024 09:49:26 +0000

The happy users of Gemini Advanced, the powerful AI web assistant powered by the Gemini model, can execute some Python code, thanks to a built-in Python interpreter. So, for math, logic, calculation questions, the assistant can let Gemini invent a Python script, and execute it, to let users get a more accurate answer to their queries.

But wearing my Apache Groovy hat on, I wondered if I could get Gemini to invoke some Groovy scripts as well, for advanced math questions!

LangChain4j based approach

As usual, my tool of choice for any LLM problem is the powerful LangChain4j framework! Interestingly, there are already some code engine integrations,

a GraalVM Polyglot Truffle engine, that can execute Python and JavaScript code,
a Judge0 engine that uses the Judge0 online code execution system, which also supports Groovy!

I haven’t tried Judge0 yet, as I saw it was supporting Groovy 3 only, and not yet Groovy 4. But for math or logic questions, Groovy 3 is just fine anyway. Instead, I wanted to explore how to create my own Groovy interpreter!

In the following experiment, I’m going to use the Gemini model, because it supports function calling, which means we can instruct the model that it can use some tools when needed.

Let’s walk through this step by step.

First, I instantiate a Gemini chat model:

var model = VertexAiGeminiChatModel.builder()
 .project("MY_GCP_PROJECT_ID")
 .location("us-central1")
 .modelName("gemini-1.5-flash-001")
 .maxRetries(1)
 .build();

Then, I create a tool that is able to run Groovy code, thanks to the GroovyShell evaluator:

class GroovyInterpreter {
 @Tool("Execute a Groovy script and return the result of its execution.")
 public Map<String, String> executeGroovyScript(
 @P("The groovy script source code to execute") String groovyScript) {
 String script = groovyScript.replace("\\n", "\n");
 System.err.format("%n--> Executing the following Groovy script:%n%s%n", script);
 try {
 Object result = new GroovyShell().evaluate(script);
 return Map.of("result", result == null ? "null" : result.toString());
 } catch (Throwable e) {
 return Map.of("error", e.getMessage());
 }
 }
}

Notice the @Tool annotation that describes what this tool can do. And the @P annotation which explains what the parameter is about.

I noticed that sometimes the raw script that Gemini suggested contained some \n strings, instead of the plain newline characters, so I’m replacing them with newlines instead.

I return a map containing either a result (as a string), or an error message if one was encountered.

Now it’s time to create our assistant contract, in the form of an interface, but with a very carefully crafted system instruction:

interface GroovyAssistant {
 @SystemMessage("""
 You are a problem solver equipped with the capability of \
 executing Groovy scripts.
 When you need to or you're asked to evaluate some math \
 function, some algorithm, or some code, use the \
 `executeGroovyScript` function, passing a Groovy script \
 that implements the function, the algorithm, or the code \
 that needs to be run.
 In the Groovy script, return a value. Don't print the result \
 to the console.
 Don't use semicolons in your Groovy scripts, it's not necessary.
 When reporting the result of the execution of a script, \
 be sure to show the content of that script.
 Call the `executeGroovyScript` function only once, \
 don't call it in a loop.
 """)
 String chat(String msg);
}

This complex system instruction above tells the model what its role is, and that it should call the provided Groovy script execution function whenever it encounters the need to calculate some function, or execute some logic.

I also instruct it to return values instead of printing results.

Funnily, Gemini is a pretty decent Groovy programmer, but it insists on always adding semi-colons like in Java, so for a more idiomatic code style, I suggest it to get rid of them!

The final step is now to create our LangChain4j AI service with the following code:

var assistant = AiServices.builder(GroovyAssistant.class)
 .chatLanguageModel(model)
 .chatMemory(MessageWindowChatMemory.withMaxMessages(20))
 .tools(new GroovyInterpreter())
 .build();

I combine the Gemini chat model, with a memory to keep track of users’ requests, and the Groovy interpreter tool I’ve just created.

Now let’s see if Gemini is able to create and calculate a fibonacci function:

System.out.println(
 assistant.chat(
 "Write a `fibonacci` function, and calculate `fibonacci(18)`"));

And the output is as follows:

def fibonacci(n) {
  if (n <= 1) {
    return n
  } else {
    return fibonacci(n - 1) + fibonacci(n - 2)
  }
}
fibonacci(18)

The result of executing the script is: 2584.

Discussion

It took me a bit of time to find the right system instruction to get Groovy scripts that complied to my requirements. However, I noticed sometimes some internal errors returned by the model, which I haven’t fully understood (and particularly why those happen at all)

On some occasions, I also noticed that LangChain4j keeps sending the same script for execution, in a loop. Same thing: I still have to investigate why this rare behavior happens.

So this solution is a fun experiment, but I’d call it just that, an experiment, as it’s not as rock-solid as I want it to be. But if I manage to make it more bullet-proof, maybe I could contribute it back as a dedicated execution engine for LangChain4j!

Full source code

Here’s the full content of my experiment:

import dev.langchain4j.agent.tool.P;
import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import groovy.lang.GroovyShell;
import java.util.Map;

public class GroovyCodeInterpreterAssistant {
 public static void main(String[] args) {
 var model = VertexAiGeminiChatModel.builder()
 .project("MY_GCP_PROJECT_ID")
 .location("us-central1")
 .modelName("gemini-1.5-flash-001")
 .maxRetries(1)
 .build();

 class GroovyInterpreter {
 @Tool("Execute a Groovy script and return the result of its execution.")
 public Map<String, String> executeGroovyScript(
 @P("The groovy script source code to execute")
 String groovyScript) {
 System.err.format("%n--> Raw Groovy script:%n%s%n", groovyScript);
 String script = groovyScript.replace("\\n", "\n");
 System.err.format("%n--> Executing:%n%s%n", script);
 try {
 Object result = new GroovyShell().evaluate(script);
 return Map.of("result", result == null ? "null" : result.toString());
 } catch (Throwable e) {
 return Map.of("error", e.getMessage());
 }
 }
 }

 interface GroovyAssistant {
 @SystemMessage("""
 You are a problem solver equipped with the capability of \
 executing Groovy scripts.
 When you need to or you're asked to evaluate some math \
 function, some algorithm, or some code, use the \
 `executeGroovyScript` function, passing a Groovy script \
 that implements the function, the algorithm, or the code \
 that needs to be run.
 In the Groovy script, return a value. Don't print the result \
 to the console.
 Don't use semicolons in your Groovy scripts, it's not necessary.
 When reporting the result of the execution of a script, \
 be sure to show the content of that script.
 Call the `executeGroovyScript` function only once, \
 don't call it in a loop.
 """)
 String chat(String msg);
 }

 var assistant = AiServices.builder(GroovyAssistant.class)
 .chatLanguageModel(model)
 .chatMemory(MessageWindowChatMemory.withMaxMessages(20))
 .tools(new GroovyInterpreter())
 .build();

 System.out.println(
 assistant.chat(
 "Write a `fibonacci` function, and calculate `fibonacci(18)`"));
 }
}

Grounding Gemini with Web Search results in LangChain4j

Guillaume Laforge — Tue, 28 May 2024 05:42:43 +0000

The latest release of LangChain4j (version 0.31) added the capability of grounding large language models with results from web searches. There’s an integration withGoogle Custom Search Engine, and also Tavily.

The fact of grounding an LLM’s response with the results from a search engine allows the LLM to find relevant information about the query from web searches, which will likely include up-to-date information that the model won’t have seen during its training, past its cut-off date when the training ended.

Remark: Gemini has a built-in Google Web Search groundingcapability, however, LangChain4j’s Gemini integration doesn’t yet surface this feature. I’m currently working on a pull request to support this.

Asking questions to your website

An interesting use case for LLM web search grounding is for example if you want to search a particular website. I was interested in asking questions related to articles that I have posted on my personal website and blog. Let’s see, step by step, how you can implement this.

Creating a custom search engine

First of all, as I decided to use Google Custom Search, I created a new custom search engine. I won’t detail the steps involved in this process, as it’s explained in the documentation. I created a custom search searching only the content on my website: glaforge.dev. But you can potentially search the whole internet if you wish, or just your company website, etc.

Google Custom Search gave me an API key, as well as a Custom Search ID (csi) for my newly created custom search engine. You can test the custom search engine with that ID with this URL:https://programmablesearchengine.google.com/controlpanel/overview?cx=YOUR_CSI_HERE. It gives you a Google Search-like interface where you can enter your queries. There’s also a widget that you can integrate in your website if you wish.

Implementation

First of all, I configure the chat model I want to use. I’m using the latest and fastest Gemini model: Gemini 1.5 Flash. I’ve saved my Google Cloud project ID and locaction in environment variables.

VertexAiGeminiChatModel model = VertexAiGeminiChatModel.builder()
 .project(System.getenv("PROJECT_ID"))
 .location(System.getenv("LOCATION"))
 .modelName("gemini-1.5-flash-001")
 .build();

Next, I configure my web search engine. Here, I’m using Google Search, but it could be Tavily as well. I also saved my API key and the ID of my custom web search in environment variables:

WebSearchEngine webSearchEngine = GoogleCustomWebSearchEngine.builder()
 .apiKey(System.getenv("GOOGLE_CUSTOM_SEARCH_API_KEY"))
 .csi(System.getenv("GOOGLE_CUSTOM_SEARCH_CSI"))
// .logRequests(true)
// .logResponses(true)
 .build();

Note that you can log the requests and responses, for debugging purpose.

Next, I define a content retriever, this is a way to let LangChain4j know that content can be retrieved from a particular tool or location:

ContentRetriever contentRetriever = WebSearchContentRetriever.builder()
 .webSearchEngine(webSearchEngine)
 .maxResults(3)
 .build();

Now, I define the contract I want to use to interact with my Gemini model, by creating my own custom search interface:

interface SearchWebsite {
 String search(String query);
}

This interface will be implemented by LangChain4j’s AiServices system that binds several components together: the chat language model (here, Gemini), and the web search content retriever I created above:

SearchWebsite website = AiServices.builder(SearchWebsite.class)
 .chatLanguageModel(model)
 .contentRetriever(contentRetriever)
 .build();

Then I can ask my question to the LLM, which will find the relevant information in my blog:

String response = website.search(
 "How can I call the Gemma model from LangChain4j?");

System.out.println("response = " + response);

If I comment out the line contentRetriever(contentRetriever), Gemini does a best effort at answering my question, but since there’s nothing in its training data (before its cut-off date) about how to call the Gemma model from LangChain4j, it is not able to provide a useful answer.

But with the web search content retriever, Gemini is able to find the right material to ground its answer, as the custom search returns my article oncalling Gemma with Ollama, Testcontainers, and LangChain4j:

Based on the provided information, you can call the Gemma model from
LangChain4j using the following approach:
1. **Use Ollama:** The articles highlight Ollama as a tool for
interacting with Gemma. You would need to set up Ollama and ensure it
has access to the Gemma model.
2. **Integrate TestContainers:** TestContainers helps you manage
containerized environments for testing. You can use it to run Ollama
within a container alongside LangChain4j.
3. **Utilize LangChain4j:** LangChain4j provides the framework for
interacting with large language models. You would define your prompt,
send it to Ollama (which runs Gemma), and receive the response back
through LangChain4j.
**Example Steps:**
1. **Set up Ollama:** Install Ollama and configure it to use the
Gemma model.
2. **Create a Dockerfile:** Use a Dockerfile to define an image that
includes Ollama and any dependencies.
3. **Run Ollama in a container using TestContainers:** Start the
container using TestContainers and ensure it is accessible from your
LangChain4j code.
4. **Implement LangChain4j calls:** Use LangChain4j to construct your
prompt and send it to Ollama (which will pass it to Gemma).
5. **Receive and process the response:** Receive the generated response
from Gemma and process it as needed in your Java application.
**Note:** These steps provide a general approach. You will need to
refer to the documentation for Ollama, TestContainers, and LangChain4j
for specific implementation details.
This method leverages Ollama as an intermediary to access Gemma.
If you have access to Google's Gemini model directly, you might be
able to integrate it with LangChain4j without the Ollama step,
depending on the specific API or SDK offered by Google.

The LLM found that I have to use Ollama andTestContainers, as explained in my article. This information wasn’t part of my query, so it proves that it really found the info in the article.

Discussion

The LLM based its answer on the excerpts contained in the search results, not the whole content of the article, so some aspects of this answer are not totally correct: For instance, you don’t have to install Ollama or create your own Dockerfile.

To make the response perfect, I believe we would have to combine web search results with Retrieval Augmented Generation, or pass the whole context of the article to the model, so that it could provide a more thorough and factual answer.

For different queries that lead to shorter answers, the answer would probably be more to the point.

Another approach is to annotate our String search(String query) method with a @SystemInstruction()with instructions that encourage the LLM to provide a shorter answer. But it’s difficult to find the right balance between too long and too short, and of course without any sort of hallucinations!

For example, you can try with the following system instruction:

interface SearchWebsite {
 @SystemMessage("""
 Provide a paragraph-long answer, not a long step by step explanation.
 Reply with "I don't know the answer" if the provided information isn't relevant.
 """)
 String search(String query);
}

I got the following response:

The provided information mentions using Gemma with Ollama,
TestContainers, and LangChain4j. You can use Ollama, a local
LLM server, and TestContainers, which provides lightweight,
disposable containers, to set up a testing environment.
Then, with LangChain4j, a Java library for interacting with LLMs,
you can call Gemma through the Ollama server.

Which is shorter and more factual, without being too short either!

What’s next?

In an upcoming article, I’ll show you how to use Gemini’s built-in Google Search grounding, but first, I have to finish my pull request for the LangChain4j project!

Or I can explore how to reply more precisely to queries that lead to complex answers like the above, maybe combinging a RAG approach to get the full context of the article found by the web search.

Also, the Tavily API seems to be able to return the raw content of the article, so maybe it can help giving the LLM the full context of the article to base its answers on it. So that may be worth comparing those two web search integrations too.

Stay tuned!

Full sample code

For reference, here is the full sample (with the system instruction approach):

import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.WebSearchContentRetriever;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.web.search.WebSearchEngine;
import dev.langchain4j.web.search.google.customsearch.GoogleCustomWebSearchEngine;

public class GroundingWithSearch {
 public static void main(String[] args) {
 VertexAiGeminiChatModel model = VertexAiGeminiChatModel.builder()
 .project(System.getenv("PROJECT_ID"))
 .location(System.getenv("LOCATION"))
 .modelName("gemini-1.5-flash-001")
 .build();

 WebSearchEngine webSearchEngine = GoogleCustomWebSearchEngine.builder()
 .apiKey(System.getenv("GOOGLE_CUSTOM_SEARCH_API_KEY"))
 .csi(System.getenv("GOOGLE_CUSTOM_SEARCH_CSI"))
// .logRequests(true)
// .logResponses(true)
 .build();

 ContentRetriever contentRetriever = WebSearchContentRetriever.builder()
 .webSearchEngine(webSearchEngine)
 .maxResults(3)
 .build();

 interface SearchWebsite {
 @SystemMessage("""
 Provide a paragraph-long answer, not a long step by step explanation.
 Reply with "I don't know the answer" if the provided information isn't relevant.
 """)
 String search(String query);
 }

 SearchWebsite website = AiServices.builder(SearchWebsite.class)
 .chatLanguageModel(model)
 .contentRetriever(contentRetriever)
 .build();

 String response = website.search(
 "How can I call the Gemma model from LangChain4j?");

 System.out.println("response = " + response);
 }
}

Calling Gemma with Ollama, TestContainers, and LangChain4j

Guillaume Laforge — Wed, 03 Apr 2024 17:02:01 +0000

Lately, for my Generative AI powered Java apps, I’ve used the Geminimultimodal large language model from Google. But there’s also Gemma, its little sister model.

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma is available in two sizes: 2B and 7B. Its weights are freely available, and its small size means you can run it on your own, even on your laptop. So I was curious to give it a run with LangChain4j.

How to run Gemma

There are many ways to run Gemma: in the cloud, via Vertex AIwith a click of a button, or GKE with some GPUs, but you can also run it locally with Jlama orGemma.cpp.

Another good option is to run Gemma with Ollama, a tool that you install on your machine, and which lets you run small models, like Llama 2, Mistral, and many others. They quickly added support for Gemma as well.

Once installed locally, you can run:

ollama run gemma:2b
ollama run gemma:7b

Cherry on the cake, the LangChain4j library provides anOllama module, so you can plug Ollama supported models in your Java applications easily.

Containerization

After a great discussion with my colleague Dan Dobrinwho had worked with Ollama and TestContainers (#1 and#2) in his serverless production readiness workshop, I decided to try the approach below.

Which brings us to the last piece of the puzzle: Instead of having to install and run Ollama on my computer, I decided to use Ollama within a container, handled by TestContainers.

TestContainers is not only useful for testing, but you can also use it for driving containers. There’s even a specific OllamaContainer you can take advantage of!

So here’s the whole picture:

Time to implement this approach!

You’ll find the code in the Githubrepositoryaccompanying my recent Gemini workshop

Let’s start with the easy part, interacting with an Ollama supported model with LangChain4j:

OllamaContainer ollama = createGemmaOllamaContainer();
ollama.start();

ChatLanguageModel model = OllamaChatModel.builder()
 .baseUrl(String.format("http://%s:%d", ollama.getHost(), ollama.getFirstMappedPort()))
 .modelName("gemma:2b")
 .build();

String response = model.generate("Why is the sky blue?");

System.out.println(response);

You run an Ollama test container.
You create an Ollama chat model, by pointing at the address and port of the container.
You specify the model you want to use.
Then, you just need to call model.generate(yourPrompt) as usual.

Easy? Now let’s have a look at the trickier part, my local method that creates the Ollama container:

// check if the custom Gemma Ollama image exists already
List<Image> listImagesCmd = DockerClientFactory.lazyClient()
 .listImagesCmd()
 .withImageNameFilter(TC_OLLAMA_GEMMA_2_B)
 .exec();

if (listImagesCmd.isEmpty()) {
 System.out.println("Creating a new Ollama container with Gemma 2B image...");
 OllamaContainer ollama = new OllamaContainer("ollama/ollama:0.1.26");
 ollama.start();
 ollama.execInContainer("ollama", "pull", "gemma:2b");
 ollama.commitToImage(TC_OLLAMA_GEMMA_2_B);
 return ollama;
} else {
 System.out.println("Using existing Ollama container with Gemma 2B image...");
 // Substitute the default Ollama image with our Gemma variant
 return new OllamaContainer(
 DockerImageName.parse(TC_OLLAMA_GEMMA_2_B)
 .asCompatibleSubstituteFor("ollama/ollama"));
}

You need to create a derived Ollama container that pulls in the Gemma model. Either this image was already created beforehand, or if it doesn’t exist yet, you create it.

Use the Docker Java client to check if the custom Gemma image exists. If it doesn’t exist, notice how TestContainers let you create an image derived from the base Ollama image, pull the Gemma model, and then commit that image to your local Docker registry.

Otherwise, if the image already exists (ie. you created it in a previous run of the application), you’re just going to tell TestContainers that you want to substitute the default Ollama image with your Gemma-powered variant.

And voila!

You can call Gemma locally on your laptop, in your Java apps, using LangChain4j , without having to install and run Ollama locally (but of course, you need to have a Docker daemon running).

Big thanks to Dan Dobrin for the approach, and to Sergei, Eddúand Oleg from TestContainers for the help and useful pointers.

Gemini codelab for Java developers using LangChain4j

Guillaume Laforge — Wed, 27 Mar 2024 18:11:58 +0000

No need to be a Python developer to do Generative AI! If you’re a Java developer, you can take advantage of LangChain4jto implement some advanced LLM integrations in your Java applications. And if you’re interested in usingGemini, one of the best models available, I invite you to have a look at the following “codelab” that I worked on:

Codelab — Gemini for Java Developers using LangChain4j

In this workshop, you’ll find various examples covering the following use cases, in crescendo approach:

Making your fist call to Gemini (streaming & non-streaming)
Maintaining a conversation
Taking advantage of multimodality by analysing images with your prompts
Extracting structured information from unstructured text
Using prompt templates
Doing text classification with few-shot prompting
Implementing Retrieval Augmented Generation to chat with your documentation
How to do Function Calling to expand the LLM to interact with external APIs and services

You’ll find all the code samples on Github.

If you’re attending Devoxx France, be sure to attend theHands-on-Lab workshop with my colleaguesMete Atamel and Valentin Deleplacewho will guide you through this codelab.

Visualize PaLM-based LLM tokens

Guillaume Laforge — Mon, 05 Feb 2024 08:44:22 +0000

As I was working on tweaking the Vertex AI text embedding model in LangChain4j, I wanted to better understand how the textembedding-geckomodeltokenizes the text, in particular when we implement theRetrieval Augmented Generation approach.

The various PaLM-based models offer a computeTokens endpoint, which returns a list of tokens (encoded in Base 64) and their respective IDs.

Note: At the time of this writing, there’s no equivalent endpoint for Gemini models.

So I decided to create a small application that lets users:

input some text,
select a model,
calculate the number of tokens,
and visualize them with some nice pastel colors.

The available PaLM-based models are:

textembedding-gecko
textembedding-gecko-multilingual
text-bison
text-unicorn
chat-bison
code-gecko
code-bison
codechat-bison

You can try the application online.

And also have a look at the source code on Github. It’s a Micronaut application. I serve the static assets as explained in my recentarticle. I deployed the application on Google Cloud Run, the easiest way to deploy a container, and let it auto-scale for you. I did a source based deployment, as explained at the bottomhere.

And voilà I can visualize my LLM tokens!

Image generation with Imagen and LangChain4j

Guillaume Laforge — Thu, 01 Feb 2024 08:25:56 +0000

This week LangChain4j, the LLM orchestration framework for Java developers, released version0.26.1, which contains my first significant contribution to the open source project: support for the Imagen image generation model.

Imagen is a text-to-image diffusion model that wasannounced last year. And it recently upgraded to Imagen v2, with even higher quality graphics generation. As I was curious to integrate it in some of my generative AI projects, I thought that would be a great firstcontribution to LangChain4j.

Caution: At the time of this writing, image generation is still only for allow-listed accounts.

Furthermore, to run the snippets covered below, you should have an account on Google Cloud Platform, created a project, configured a billing account, enabled the Vertex AI API, and authenticated with the gcloud SDK and the command:gcloud auth application-default login.

Now let’s dive in how to use Imagen v1 and v2 with LangChain4j in Java!

Generate your first images

In the following examples, I’m using the following constants, to point at my project details, the endpoint, the region, etc:

private static final String ENDPOINT = "us-central1-aiplatform.googleapis.com:443";
private static final String LOCATION = "us-central1";
private static final String PROJECT = "YOUR_PROJECT_ID";
private static final String PUBLISHER = "google";

First, we’re going to create an instance of the model:

VertexAiImageModel imagenModel = VertexAiImageModel.builder()
 .endpoint(ENDPOINT)
 .location(LOCATION)
 .project(PROJECT)
 .publisher(PUBLISHER)
 .modelName("imagegeneration@005")
 .maxRetries(2)
 .withPersisting()
 .build();

There are 2 models you can use:

imagegeneration@005 corresponds to Imagen 2
imagegeneration@002 is the previous version (Imagen 1)

In this article, we’ll use both models. Why? Because currently Imagen 2 doesn’t support image editing, so we’ll have to use Imagen 1 for that purpose.

The configuration above uses withPersisting() to save the generated images in a temporary folder on your system. If you don’t persist the image files, the content of the image is avaiable as Base 64 encoded bytes in the Images objects returned. You can also specify persistTo(somePath) to specify a particular directory where you want the generated files to be saved.

Let’s create our first image:

Response<Image> imageResponse = imagenModel.generate(
 "watercolor of a colorful parrot drinking a cup of coffee");

The Response object wraps the created Image. You can get the Image by calling imageResponse.getContent(). And you can retrieve the URL of the image (if saved locally) with imageResponse.getContent().url(). The Base 64 encoded bytes can be retrieved with imageResponse.getContent().base64Data()

Some other tweaks to the model configuration:

Specify the language of the prompt: language("ja")(if the language is not officially supported, it’s usually translated back to English anyway).
Define a negative prompt with things you don’t want to see in the picture: negativePrompt("black feathers").
Use a particular seed to always generate the same image with the same seed: seed(1234L).

So if you want to generate a picture of a pizza with a prompt in Japanese, but you don’t want to have pepperoni and pineapple, you could configure your model and generate as follows:

VertexAiImageModel imagenModel = VertexAiImageModel.builder()
 .endpoint(ENDPOINT)
 .location(LOCATION)
 .project(PROJECT)
 .publisher(PUBLISHER)
 .modelName("imagegeneration@005")
 .language("ja")
 .negativePrompt("pepperoni, pineapple")
 .maxRetries(2)
 .withPersisting()
 .build();

Response<Image> imageResponse = imagenModel.generate("ピザ"); // pizza

Image editing with Imagen 1

With Imagen 1, you can edit existing images:

mask-based editing: you can specify a mask, a black & white image where the white parts are the corresponding parts of the original image that should be edited,
mask free editing: where you just give a prompt and let the model figure out what should be edited on its own or following the prompt.

When generating and editing with Imagen 1, you can also configure the model to use a particular style (with Imagen 2, you just specify it in the prompt) with sampleImageStyle(VertexAiImageModel.ImageStyle.photograph):

photograph
digital_art
landscape
sketch
watercolor
cyberpunk
pop_art

When editing an image, you may wish to decide how strong or not the modification should be, with .guidanceScale(100). Usually, between 0 and 20 or so, it’s lightly edited, between 20 and 100 it’s getting more impactful edits, and 100 and above it’s the maximum edition level.

Let’s say I generated an image of a lush forrest (I’ll use that as my original image):

VertexAiImageModel model = VertexAiImageModel.builder()
 .endpoint(ENDPOINT)
 .location(LOCATION)
 .project(PROJECT)
 .publisher(PUBLISHER)
 .modelName("imagegeneration@002")
 .seed(19707L)
 .sampleImageStyle(VertexAiImageModel.ImageStyle.photograph)
 .guidanceScale(100)
 .maxRetries(4)
 .withPersisting()
 .build();

Response<Image> forestResp = model.generate("lush forest");

Now I want to edit my forrest to add a small red tree in the bottom of the image. I’m loading a black and white mask image with a white square at the bottom. And I pass the original image, the mask image, and the modification prompt, to the new edit() method:

URI maskFileUri = getClass().getClassLoader().getResource("mask.png").toURI();

Response<Image> compositeResp = model.edit(
 forestResp.content(), // original image to edit
 fromPath(Paths.get(maskFileUri)), // the mask image
 "red trees" // the new prompt
);

Another kind of editing you can do is to upscale an existing image. As far as I know, it’s only supported for Imagen v1 for now, so we’ll continue with that model.

In this example, we’ll generate an image of 1024x1024 pixels, and we’ll scale it to 4096x4096:

VertexAiImageModel imagenModel = VertexAiImageModel.builder()
 .endpoint(ENDPOINT)
 .location(LOCATION)
 .project(PROJECT)
 .publisher(PUBLISHER)
 .modelName("imagegeneration@002")
 .sampleImageSize(1024)
 .withPersisting()
 .persistTo(defaultTempDirPath)
 .maxRetries(3)
 .build();

Response<Image> imageResponse =
 imagenModel.generate("A black bird looking itself in an antique mirror");

VertexAiImageModel imagenModelForUpscaling = VertexAiImageModel.builder()
 .endpoint(ENDPOINT)
 .location(LOCATION)
 .project(PROJECT)
 .publisher(PUBLISHER)
 .modelName("imagegeneration@002")
 .sampleImageSize(4096)
 .withPersisting()
 .persistTo(defaultTempDirPath)
 .maxRetries(3)
 .build();

Response<Image> upscaledImageResponse =
 imagenModelForUpscaling.edit(imageResponse.content(), "");

And now you have a much bigger image!

Conclusion

That’s about it for image generation and editing with Imagen in LangChain4j today! Be sure to use LangChain4j v0.26.1 which contains that new integration. And I’m looking forward to seeing the pictures you generate with it!

Serving static assets with Micronaut

Guillaume Laforge — Sun, 21 Jan 2024 16:23:25 +0000

My go-to framework when developing Java apps or microservices is Micronaut. For the apps that should have a web frontend, I rarely use Micronaut Views and its templating support. Instead, I prefer to just serve static assets from my resource folder, and have some JavaScript framework (usually Vue.js) to populate my HTML content (often using Shoelace for its nice Web Components). However, the static asset documentationis a bit light on explanations. So, since I always forget how to configure Micronaut to serve static assets, I thought that would be useful to document this here.

In /src/main/resources/application.properties, I’m adding the following:

micronaut.router.static-resources.default.paths=classpath:public
micronaut.router.static-resources.default.mapping=/**
micronaut.router.static-resources.default.enabled=true

micronaut.server.cors.enabled=true

The first line says that my resources will live in src/main/resources/public/.
The second line means the pattern will match recursively for sub-directories as well.
The enabled flag is to activate static serviing (not strictly needed as it’s supposed to be enabled by default).
I also enabled CORS (cross-origin resource sharing).

Then in src/main/resources/public/, I’ll have my index.html file, my css and js folders.

Light Mode Bookmarlet

Guillaume Laforge — Thu, 18 Jan 2024 08:49:01 +0000

A while ago, my friend Sylvain Wallez shared a little bookmarleton Twitter/X that transforms a dark mode site into light mode. I know the trend is towards dark mode, but for a lot of people with certain vision issues, for example with astigmatism like me, certain dark modes can very painful.

This site about vision(and you’ll find other similar references) mentions that:

People who have myopia or astigmatism also may experience halation (from the word “halo”). Halation occurs when light spreads past a certain boundary, creating a foggy or blurry appearance.

So for certain websites, often with a too strong contrast, I’m using the following bookmarklet trick.

Go to your bookmark manager, and save the following bookmarklet (I called mine “light mode”):

javascript:(function(){document.documentElement.style.filter=document.documentElement.style.filter?%27%27:%27invert(100%)%20hue-rotate(180deg)%27})();

Now, to pretty print the above code and remove the URL encoded characters, to decypher what it does:

(function () {
 document.documentElement.style.filter = document.documentElement.style.filter
 ? ""
 : "invert(100%) hue-rotate(180deg)";
})();

Two filters are going to be applied to your current web page:

First, it will completely invert all the colors, like a negative photography
Second, compared to Sylvain, I also add a hue rotation of 180 degrees

Why the hue rotation

Because the color inversion is also going to shift the colors: a red will become blue, a yellow will be dark blue, a violet will turn pink, etc. With a hue rotation, we get back the right color, a red is still red, a blue is still blue, etc. The different however will be in the lightness, as a light blue becomes dark, and a dark green becomes light. But at least, it’s a bit more faithful to the original images.

Here’s a picture to highlight the differences. See how the rainbow picture is transformed:

Possible improvements

Perhaps we could avoid applying the filter globally, or at least avoid to apply it somehow to the images, so that they are not affected by those filters. At least for now, that’s good enough for me!