<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tamás Bereczki</title>
    <description>The latest articles on Forem by Tamás Bereczki (@bereczki).</description>
    <link>https://forem.com/bereczki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F782799%2F4a7c35d6-6a48-4b43-9e82-97726b3f88a1.png</url>
      <title>Forem: Tamás Bereczki</title>
      <link>https://forem.com/bereczki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bereczki"/>
    <language>en</language>
    <item>
      <title>Spring AI: How to use Generative AI and apply RAG?</title>
      <dc:creator>Tamás Bereczki</dc:creator>
      <pubDate>Tue, 09 Sep 2025 13:20:51 +0000</pubDate>
      <link>https://forem.com/bereczki/spring-ai-how-to-use-generative-ai-and-applied-rag-2hje</link>
      <guid>https://forem.com/bereczki/spring-ai-how-to-use-generative-ai-and-applied-rag-2hje</guid>
      <description>&lt;h2&gt;
  
  
  Let’s dive into world of AI and investigate how Spring AI works and earn how to use an AI programmatically and generate some content with RAG method.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Generative AI models are powerful, but their knowledge is limited to the data they were trained on. So, how can we make them intelligent about our own specific documents or data? This is where the Retrieval-Augmented Generation (RAG) pattern comes in. In this article, I’ll guide you step-by-step through building a pet project that does exactly that, using a practical, code-first approach.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you feel the need to learn first about what Artifical Intelligence means and how it is works under the hood, read this article, thank you:&lt;br&gt;
&lt;a href="https://dev.to/bereczki/beyond-the-buzzwords-how-generative-ai-really-works-bac"&gt;https://dev.to/bereczki/beyond-the-buzzwords-how-generative-ai-really-works-bac&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k28k6fcziyrg1k62io0.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k28k6fcziyrg1k62io0.webp" alt="RAG technique workflow diagram (source: https://docs.spring.io)" width="800" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Project idea
&lt;/h2&gt;

&lt;p&gt;A kinda common idea came up in my mind, which is a usual pet project in universities or at home, create a movie database service (like an IMDB). But in this case, I am going to focus on how can I tune this movie database with using AI services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project:&lt;/strong&gt; Customized media suggestion service&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt;&lt;br&gt;
We should create a system, which can give personal suggestions on media contents regarding their interesting topics and previously checked contents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data: Collect contents (movies data) and store them in vector database&lt;/li&gt;
&lt;li&gt;User profile: Let’s assume, there are registered users in the system. And the system collects feedbacks from users to watched movies, which one was liked, which one was not liked.&lt;/li&gt;
&lt;li&gt;RAG applying: When user login, or request for new suggestions, system will query the liked contents. These contents will be used to find another movies, which are similar to them. Vector database similarity search feature will be used there.&lt;/li&gt;
&lt;li&gt;Generative AI will get these suggestions and summarize them in a personalized result.&lt;/li&gt;
&lt;li&gt;Fine-tuning: Generative AI can sum up, why the suggested content is relevant for the user and provide a short description about why we think the suggested movie will be liked by them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Take aways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We can check how can RAG be used for personalization&lt;/li&gt;
&lt;li&gt;Give possibility to learn how to vectorize content, store it and do similarity search.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Spring AI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is it?&lt;/strong&gt;&lt;br&gt;
Spring Framework is a mature and well known tool for Java developers to build webserver. Spring has a lot of different tools which developers can use to provide solutions. A brand new tool is Spring AI, which gives an abstract layer to make easy operating with AI models.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For more information, check Spring AI documentation: &lt;a href="https://docs.spring.io/spring-ai/reference/getting-started.html" rel="noopener noreferrer"&gt;https://docs.spring.io/spring-ai/reference/getting-started.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why I wanted to use Spring AI?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I mostly start new project with selecting Spring Boot framework&lt;/li&gt;
&lt;li&gt;I would like to get familiar as soon as possible with new AI related technologies&lt;/li&gt;
&lt;li&gt;Spring AI already has released version, which may have kinda stabil architecture.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Maven dependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.ai&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-ai-tika-document-reader&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.ai&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-ai-rag&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These dependencies are required for embedding and RAG plugin usage, the core AI dependencies will be imported by ollama spring dependency!&lt;/p&gt;

&lt;h3&gt;
  
  
  Ollama server
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What is it?&lt;/strong&gt;&lt;br&gt;
Ollama is a user-friendly, open-source tool designed to simplify the process of running large language models (LLMs) locally on your computer. Ollama is enabling you to download, run, and interact with these models without relying on cloud-based services.&lt;/p&gt;

&lt;p&gt;Most &lt;strong&gt;popular models&lt;/strong&gt;, you can find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deepseek-r1&lt;/li&gt;
&lt;li&gt;mistral&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How can it help me?&lt;/strong&gt;&lt;br&gt;
Ollama permits to use an LLM without subsribe to any cloud-based models (GPT from OpenAI, Gemini/VertexAI by Google, etc.), because it downloads model from their central repository or from Huggingface repository to local machine. After Ollama is a server, which provides API to operate with these models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup (on Linux)&lt;/strong&gt;&lt;br&gt;
(This Linux may be Ubuntu inside WSL on Windows)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;span class="nv"&gt;$ &lt;/span&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download, install and start the ollama server.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;By default, Spring pulls not existing models at startup, however sometimes it fails due to timeout. In this case, you should pull model manually by &lt;code&gt;ollama pull &amp;lt;model&amp;gt;&lt;/code&gt; command.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Maven dependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.ai&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-ai-starter-model-ollama&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.ai&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-ai-autoconfigure-model-ollama&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vector Database
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What is a Vector Database?&lt;/strong&gt;&lt;br&gt;
A vector database is a specialized database designed to store, manage, and query data represented as numerical vectors. These &lt;strong&gt;vectors&lt;/strong&gt; are mathematical representations of data objects (like &lt;em&gt;text&lt;/em&gt;, &lt;em&gt;images&lt;/em&gt;, or &lt;em&gt;audio&lt;/em&gt;) that &lt;strong&gt;capture their semantic meaning or characteristics&lt;/strong&gt;. Essentially, they allow computers to understand and compare data based on similarity rather than exact matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it is needed?&lt;/strong&gt;&lt;br&gt;
Beside that, the vector represent the semantic meaning of a data, in our case a movie.&lt;/p&gt;

&lt;p&gt;Imagine that there is a movie description in JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The Godfather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
  &lt;/span&gt;&lt;span class="nl"&gt;"genre"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Crime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
  &lt;/span&gt;&lt;span class="nl"&gt;"actors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Marlon Brando"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Al Pacino"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During the embedding operation, from this JSON data a vector is created and stored into Vector Database. Then, when query by plot of “&lt;em&gt;Mafia family&lt;/em&gt;”, then similarity search will more likely return “&lt;em&gt;The Godfather&lt;/em&gt;” movie.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We need Vector Database to store movie and vector for that movie&lt;/li&gt;
&lt;li&gt;Provide similarity search function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Vector Databases are available?&lt;/strong&gt;&lt;br&gt;
There are several database vendor who created a vector feature for the database engine and Spring AI supports a lot of them. Here are a list about, but check out in &lt;a href="https://docs.spring.io/spring-ai/reference/api/vectordbs.html" rel="noopener noreferrer"&gt;Spring AI guide&lt;/a&gt; for full list and related informations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache Cassandra&lt;/li&gt;
&lt;li&gt;Couchbase&lt;/li&gt;
&lt;li&gt;Elasticseach&lt;/li&gt;
&lt;li&gt;MariaDB&lt;/li&gt;
&lt;li&gt;MongoDB Atlas&lt;/li&gt;
&lt;li&gt;OpenSearch&lt;/li&gt;
&lt;li&gt;Oracle Database&lt;/li&gt;
&lt;li&gt;Postgres&lt;/li&gt;
&lt;li&gt;Redis&lt;/li&gt;
&lt;li&gt;and so on…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;I choose Elasticsearch&lt;/strong&gt; because previously I worked with it a lot and I don’t want to deep dive now into an unknown database engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start Elasticsearch server with Docker:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; elasticsearch &lt;span class="nt"&gt;--net&lt;/span&gt; somenetwork &lt;span class="nt"&gt;-p&lt;/span&gt; 9200:9200 &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-p&lt;/span&gt; 9300:9300 &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"discovery.type=single-node"&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"xpack.security.enabled=false"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
docker.elastic.co/elasticsearch/elasticsearch:9.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is going to start Elasticsearch server and expose it on 9200 port. If you found any issue, please check their &lt;a href="https://hub.docker.com/_/elasticsearch" rel="noopener noreferrer"&gt;Docker description&lt;/a&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Configurate Elasticsearch vector store in Spring application.yaml properties file:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;elasticsearch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;uris&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:9200&lt;/span&gt;
  &lt;span class="na"&gt;ai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    
    &lt;span class="na"&gt;vectorstore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;elasticsearch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;initialize-schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;index-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;movies&lt;/span&gt;
        &lt;span class="na"&gt;dimensions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt; &lt;span class="c1"&gt;# vector dimension which depends on selected embedding model, in case of 'mxbai-embed-large' is 1024&lt;/span&gt;
        &lt;span class="na"&gt;similarity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cosine&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Take away:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We have a platform to build a solution for the defined project (Spring Framework✅)&lt;/li&gt;
&lt;li&gt;We have tool to manage AI (Spring AI ✅)&lt;/li&gt;
&lt;li&gt;We have LLM model to use for embedding and generating content (Ollama ✅)&lt;/li&gt;
&lt;li&gt;We have Vector Database to store movies and perform similarity search (Elasticsearch ✅)&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Selecting AI model
&lt;/h3&gt;

&lt;p&gt;Spring AI framework has default configurations for embedding and generative features, which you can set up in application.yaml properties file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;embedding&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mxbai-embed-large&lt;/span&gt;
      &lt;span class="na"&gt;chat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mistral&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is the difference between embedding and chat models?&lt;/strong&gt;&lt;br&gt;
This is a question regarding to fundation of AI, so first check with another question:&lt;br&gt;
&lt;strong&gt;What is the difference between embedding and content generation?&lt;/strong&gt;&lt;br&gt;
Embedding is procedure when a document is vectorized, it is encoder type of AI.&lt;br&gt;
Content generation is another type of AI — decoder — , when there is no input, but there is output.&lt;br&gt;
(&lt;em&gt;This is an oversimplification, but if it is not clear, please read article referenced at the beginning of this article&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;So this is the difference, we can use same or different model for input (vectorization) and for generation (chat model).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using AI Model for what?&lt;/strong&gt;&lt;br&gt;
AI model is going to use during vectorize Movies; and going to use to generate text content (response) for the user who requests movie suggestions. This is similar as well known chat models (GPT, Gemini, etc.).&lt;/p&gt;


&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;Let’s see what every developer reader wants, the code itself how all these are resolved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The code quality is not the best, I know also! Because this project is just a Proof of Concept and to learn, clean code was not high priority. Thanks for understanding!&lt;/p&gt;
&lt;h3&gt;
  
  
  #1 Create Test Data
&lt;/h3&gt;

&lt;p&gt;First things which we need at the beginning stage of solution:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Design data layer which will suite for our solution&lt;/li&gt;
&lt;li&gt;Create Movie descriptions (id, title, year, genre, director, actors, plot)&lt;/li&gt;
&lt;li&gt;Create Users (id, username, name, email, age)&lt;/li&gt;
&lt;li&gt;Create movie ratings by users (id, userId, movieId, rating, comment, date)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Create Java models for them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Ratings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;RatedMovie&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="nd"&gt;@JsonIgnoreProperties&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ignoreUnknown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;RatedMovie&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;movieId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;dateRated&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Movies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Movie&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="nd"&gt;@JsonIgnoreProperties&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ignoreUnknown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Movie&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;genre&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;actors&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To generate test data, I use another AI which is provided in JetBrains IDEA Intellij, the Junie agent. I asked it to create json files into resources for movie, users and ratings regarding defined attributes. Junie successfully created the test data, step by step. It checked defined model classes and used them to declare required attributes, then ask permission to write files into resources folder and generates test data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"movies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The Shawshank Redemption"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1994&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"genre"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Drama"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"director"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Frank Darabont"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"actors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Tim Robbins"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Morgan Freeman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Bob Gunton"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"plot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie-002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The Godfather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1972&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"genre"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Crime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Drama"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"director"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Francis Ford Coppola"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"actors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Marlon Brando"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Al Pacino"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"James Caan"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"plot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie-003"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The Dark Knight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2008&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"genre"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Crime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Drama"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"director"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Christopher Nolan"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"actors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Christian Bale"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Heath Ledger"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Aaron Eckhart"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"plot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie_buff_42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"John Smith"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"john.smith@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"New York, USA"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user-002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cinema_lover"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Emma Johnson"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"emma.j@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Los Angeles, USA"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ratings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rating-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"movieId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Absolutely brilliant film. The performances are outstanding and the story is deeply moving."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dateRated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-01-15"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rating-002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"movieId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie-003"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heath Ledger's Joker is one of the greatest performances in cinema history."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dateRated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-02-03"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rating-003"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user-002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"movieId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"movie-005"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mind-bending plot with amazing visuals. Nolan at his best."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dateRated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2023-01-22"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  #2 Vectorize Movies
&lt;/h3&gt;

&lt;p&gt;‣ &lt;strong&gt;Define a TextSplitter bean&lt;/strong&gt; implementation, which is going to be used during vectorize to split document into tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RatingAiConfiguration&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;TextSplitter&lt;/span&gt; &lt;span class="nf"&gt;textSplitter&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;TokenTextSplitter&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‣ &lt;strong&gt;Create service&lt;/strong&gt; to add Movie document into Vector Database (MovieSuggestionService.java)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MovieSuggestionService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;VectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;MovieSuggestionService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;VectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;vectorStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;VectorStore is Spring AI interface which implemented by ElasticsearchVectorStore that is injected.&lt;/p&gt;

&lt;p&gt;‣ &lt;strong&gt;Initialize Vector Database with data&lt;/strong&gt;, from resources files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;read test movies,&lt;/li&gt;
&lt;li&gt;parse into Java model (Movies.class),&lt;/li&gt;
&lt;li&gt;use Json reader to create Document (Document is managed by Vector Databases)&lt;/li&gt;
&lt;li&gt;extend each document with movie identification and a randomized popularity index. During similarity search, these metadata informations can be used as filter to documents.&lt;/li&gt;
&lt;li&gt;split document into sequence of tokens&lt;/li&gt;
&lt;li&gt;store the test movies with vector representation
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Autowired&lt;/span&gt;
&lt;span class="nc"&gt;MovieSuggestionService&lt;/span&gt; &lt;span class="n"&gt;movieSuggestionService&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="nd"&gt;@Autowired&lt;/span&gt;
&lt;span class="nc"&gt;TextSplitter&lt;/span&gt; &lt;span class="n"&gt;textSplitter&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;indexTestMovies&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;ObjectMapper&lt;/span&gt; &lt;span class="n"&gt;objectMapper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ObjectMapper&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;AiRagApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getClassLoader&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getResourceAsStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"movies.json"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Movies&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Movie&lt;/span&gt; &lt;span class="n"&gt;movie&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;movies&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;JsonReader&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JsonReader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ByteArrayResource&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeValueAsBytes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;movie&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;forEach&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMetadata&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"popularity"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RandomUtils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;insecure&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;randomInt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
            &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMetadata&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"movieId"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;movie&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
            &lt;span class="n"&gt;movieSuggestionService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;textSplitter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;split&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
        &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"done"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See movies index in Elasticsearch with content and embedding:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuuo0tlhl7k0vay4v0ho.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuuo0tlhl7k0vay4v0ho.webp" alt="Elasticsearch data after indexing document and vector" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  #3 Implement similarity search for RAG
&lt;/h3&gt;

&lt;p&gt;Extend the MovieSuggestionService with search function, which needs of&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt from ‘user’, which is the content to find similarity by.&lt;/li&gt;
&lt;li&gt;expression filter, if additional filterin on document metadata is needed&lt;/li&gt;
&lt;li&gt;SearchRequestOption, if needed to declare custom search algoritm, like similarity threshold for documents or the topK parameter, which limit results up to K number
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MovieSuggestionService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;SearchRequestOption&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Double&lt;/span&gt; &lt;span class="n"&gt;similarityThreshold&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;SearchRequestOption&lt;/span&gt; &lt;span class="n"&gt;searchRequestOption&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SearchRequestOption&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;DEFAULT_TOP_K&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userPromptText&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Expression&lt;/span&gt; &lt;span class="n"&gt;filterExpression&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userPromptText&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filterExpression&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;searchRequestOption&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userPromptText&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Expression&lt;/span&gt; &lt;span class="n"&gt;filterExpression&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SearchRequestOption&lt;/span&gt; &lt;span class="n"&gt;searchRequestOption&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;SearchRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;searchRequestBuilder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SearchRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;similarityThreshold&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;searchRequestOption&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;similarityThreshold&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topK&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;searchRequestOption&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topK&lt;/span&gt;&lt;span class="o"&gt;()).&lt;/span&gt;&lt;span class="na"&gt;similarityThresholdAll&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Objects&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;nonNull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userPromptText&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;userPromptText&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isBlank&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;searchRequestBuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userPromptText&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Objects&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;nonNull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filterExpression&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;searchRequestBuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filterExpression&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filterExpression&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;searchRequestBuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SearchRequest&lt;/span&gt; &lt;span class="n"&gt;searchRequest&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Search request: {}"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;searchRequest&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;similaritySearch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;searchRequest&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As soon as this is done, create a SuggestionRestController.java which will contain endpoint definition, but now we only implement there this similarity search function call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestController&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SuggestionRestController&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Autowired&lt;/span&gt;
    &lt;span class="nc"&gt;MovieSuggestionService&lt;/span&gt; &lt;span class="n"&gt;movieSuggestionService&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findSimilarMovies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;referenceMovie&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;movieSuggestionService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;JsonReader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ByteArrayResource&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;referenceMovie&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getText&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
                &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ExpressionType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;GTE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"popularity"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  #4 Create Movie Suggestion endpoint and configure Chat Client (with RAG)
&lt;/h3&gt;

&lt;p&gt;Extend RatingAiConfiguration with a ChatClient bean. This client will have a system prompt which defines what is this ChatClient is for and generate content regarding to that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RatingAiConfiguration&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="nf"&gt;movieSuggestAi&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultSystem&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                        &lt;span class="s"&gt;"You are a chat bot for movie suggestions. Use the provided movies suggest another "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                                &lt;span class="s"&gt;"ones to watch and write a interesting summary of the movie. You can append the provided "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                                &lt;span class="s"&gt;"movies with another ones which similar to them. Maximum 3 another movies you can suggest."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Go to SuggestionRestController to define suggest endpoint and implement that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestController&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SuggestionRestController&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

  &lt;span class="nd"&gt;@Qualifier&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"movieSuggestAi"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="nd"&gt;@Autowired&lt;/span&gt;
  &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="n"&gt;movieSuggestionGenAi&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

  &lt;span class="nd"&gt;@GetMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/suggest"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;suggestMovies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestParam&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;IOException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ratedMovie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queryUserRatedMovies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// finds rated movies by user&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;movieSuggestionGenAi&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Prompt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Give some movie suggestions to watch."&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;advisors&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;advisorSpec&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;advisorSpec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;advisors&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;movieSuggestionRag&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratedMovie&lt;/span&gt;&lt;span class="o"&gt;)))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatResponse&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getResult&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getOutput&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getText&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="cm"&gt;/**
  * Get movie documents which were rated by user (1-5). 
  * Do a similarity search by them to find movies similar to user liked.
  */&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;RetrievalAugmentationAdvisor&lt;/span&gt; &lt;span class="nf"&gt;movieSuggestionRag&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ratedMovie&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;RetrievalAugmentationAdvisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
              &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;documentRetriever&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
                      &lt;span class="n"&gt;ratedMovie&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                              &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flatMap&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;movieBytes&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
                                      &lt;span class="n"&gt;findSimilarMovies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;movieBytes&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                              &lt;span class="o"&gt;)&lt;/span&gt;
                              &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                              &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toList&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
              &lt;span class="o"&gt;)&lt;/span&gt;
              &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aaand that’s it, we have a ai based movie suggestion solution ready! ✨🎉&lt;/p&gt;




&lt;h3&gt;
  
  
  Testing suggestion endpoint
&lt;/h3&gt;

&lt;p&gt;After starting Spring application on localhost, default port number is 8080, then you will able to send request to our defined movie suggestion endpoint.&lt;/p&gt;

&lt;p&gt;Let’s get an existing user from test data: user-001&lt;/p&gt;

&lt;p&gt;Send request (in Postman or cURL) to:&lt;br&gt;
&lt;em&gt;&lt;a href="http://localhost:8080/suggest?userId=user-001" rel="noopener noreferrer"&gt;http://localhost:8080/suggest?userId=user-001&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After a little time, the chat model starts to give back the response with suggestions by already liked movies of ‘user-001’.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o4pdett0nnltuyyu569.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o4pdett0nnltuyyu569.webp" alt="Postman request/response of movie suggestion endpoint" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Thank you for your attention, I hope I managed to share something useful by my experiences!&lt;/strong&gt;&lt;br&gt;
You can take a try too and have a nice day. 😊👋&lt;/p&gt;

</description>
      <category>ai</category>
      <category>springboot</category>
      <category>rag</category>
      <category>java</category>
    </item>
    <item>
      <title>Beyond the Buzzwords: How Generative AI Really Works</title>
      <dc:creator>Tamás Bereczki</dc:creator>
      <pubDate>Tue, 09 Sep 2025 10:02:47 +0000</pubDate>
      <link>https://forem.com/bereczki/beyond-the-buzzwords-how-generative-ai-really-works-bac</link>
      <guid>https://forem.com/bereczki/beyond-the-buzzwords-how-generative-ai-really-works-bac</guid>
      <description>&lt;h2&gt;
  
  
  A deep dive into the core mechanics of modern LLMs, explaining the essential concepts that separate a casual user from a true practitioner
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LLM Architectures
&lt;/h3&gt;

&lt;p&gt;Its core innovation is the attention mechanism, which allows the model to weigh the importance of different words in the input text when processing and generating language. This architecture is the backbone of most modern LLMs, including models like GPT and BERT. The Transformer is composed of two primary building blocks: Encoders and Decoders. Different models use these blocks in different combinations to achieve their specific capabilities.&lt;/p&gt;

&lt;p&gt;Before moving forward let’s make sure the terms are clear, let’s define them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text/Document: The full sequence of words you are working with.&lt;/li&gt;
&lt;li&gt;Token: The smallest unit the model processes. After tokenization, a sentence is broken down into these pieces. A token is often a word (like “They”) or a sub-word (like “ing” in “running”).&lt;/li&gt;
&lt;li&gt;Embedding: A numerical vector that represents the semantic meaning of a token or a sequence of tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Encoders and Decoders
&lt;/h4&gt;

&lt;p&gt;AI model types has different capabilities, i.e. embedding, text generation, text to image generation, etc.&lt;br&gt;
The models of each types have variaty in sizes (number of parameters).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Typically, Encoders are used for embedding, and Decoders are used for generation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  Encoders
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What is Embedding?&lt;/strong&gt;&lt;br&gt;
Model converts a sequence of words to an embedding (a vector representation of words).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhkmnvvgzd0zprro92nj.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhkmnvvgzd0zprro92nj.webp" alt="Encoder model representation" width="649" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The sequence of words is “They sent me a”. This sentence is tokenized (tokenization means a character sequence — typically a sentence — , is broken into small pieces — mostly simple words) into chunks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Each token and the whole sentence will be embedded, a vector representation is created from them.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What are Vector Embeddings and why are they useful?&lt;/strong&gt;&lt;br&gt;
A vector embedding is a powerful concept where a word, sentence, or even an entire document is converted into a numerical representation — a list of numbers called a vector. This vector is designed to capture the rich semantic meaning and context of the original text.&lt;/p&gt;

&lt;p&gt;Imagine a vast, multi-dimensional space (often with hundreds of dimensions, such as 300 or more). In this space, every concept has a specific location, represented by its vector. The key principle is that &lt;strong&gt;semantically similar concepts will have vectors that are close to each other.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s not that a single dimension represents a simple, human-readable trait like ‘kindness’. Instead, meaning is encoded in the vector’s &lt;em&gt;overall position and its relationships with other vectors&lt;/em&gt;. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The vectors for “&lt;em&gt;polite&lt;/em&gt;” and “&lt;em&gt;courteous&lt;/em&gt;” would be located very close together.&lt;/li&gt;
&lt;li&gt;The vectors for “&lt;em&gt;king&lt;/em&gt;” and “&lt;em&gt;queen&lt;/em&gt;” would also be near each other.&lt;/li&gt;
&lt;li&gt;Furthermore, the model learns complex relationships. The vector relationship between “&lt;em&gt;king&lt;/em&gt;” and “&lt;em&gt;queen&lt;/em&gt;” is very similar to the relationship between “&lt;em&gt;man&lt;/em&gt;” and “&lt;em&gt;woman&lt;/em&gt;”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The primary application of this is enabling &lt;strong&gt;similarity search&lt;/strong&gt;. By storing these embeddings in a specialized &lt;strong&gt;vector database&lt;/strong&gt;, we can find documents or pieces of text that are semantically similar to a user’s query. Instead of just matching keywords, a similarity search finds content that matches the &lt;em&gt;meaning&lt;/em&gt; and &lt;em&gt;intent&lt;/em&gt; behind the query, leading to much more relevant and intelligent search results.&lt;/p&gt;
&lt;h4&gt;
  
  
  Decoders
&lt;/h4&gt;

&lt;p&gt;These kind of models take a sequence of words and output the next word. This is based on probability of the vocabulary which model computes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important to understand that the, Decoder only produced a single token at a time!&lt;/strong&gt; We can invoke a decoder to generate as many new tokens as we want.&lt;br&gt;
In another words, to generate a sequence of new tokens first, we need to feed decoder model with initial sequence of tokens (prompt) and invoke the model to produce the next token.&lt;/p&gt;
&lt;h4&gt;
  
  
  Encoders — Decoders
&lt;/h4&gt;

&lt;p&gt;This kind of models encodes a sequence of words and uses the encoding to output a next word.&lt;/p&gt;

&lt;p&gt;Encoders — Decoders models are typically utilized for sequence-to-sequence tasks, like translation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv359hvhcl37kv6tdm6ac.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv359hvhcl37kv6tdm6ac.webp" alt="Translation encoder-decoder model representation" width="720" height="179"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Traslation workflow is that we send the English tokens to the model, they are gotten by the encoder which embeds tokens and whole sentence. And then the embeddings get passed to the decoder. There you can notice there is a self-referential loops to the decoder. After generationg a token, that token will be passed back to the decoder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architectures at a glance&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Encoders&lt;/th&gt;
&lt;th&gt;Decoders&lt;/th&gt;
&lt;th&gt;Encoder-decoder&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding text&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstractive QA&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extractive QA&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Maybe&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Maybe&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creative writing&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstractive Summarization&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extractive Summarization&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Maybe&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forecasting&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  Prompting and Prompt Engineering
&lt;/h3&gt;
&lt;h4&gt;
  
  
  In-context Learning and Few-shot Prompting
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;In-context learning — conditaioning (prompting) an LLM with instructions and/or demonstrations of the task it is meant to complete&lt;/li&gt;
&lt;li&gt;k-shot prompting — explicitly provideing k examples of the intended task in the prompt&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Freew9524k9nw9yjc33ox.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Freew9524k9nw9yjc33ox.webp" alt="three-shot promting (example of k-shot-prompting)" width="608" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here you can see a k-shot prompting example where we tell to the model to translate by providing some examples (in this case three-shot).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Take away:&lt;/strong&gt;&lt;br&gt;
Few-shot prompting is widely belived to improve results over 0-shot prompting.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  Advanced Prompting Strategies
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Chain-of-Thought&lt;/strong&gt; (CoT) — Prompt the LLM to emit intermediate reasoning steps&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can this work at all? Remember, when generating text the model is working word by word, 1 word at a time. It doesn’t have a high-level plan about how to solve the problem.&lt;/strong&gt;&lt;br&gt;
This is precisely why Chain-of-Thought (CoT) is so effective. By explicitly instructing the model to “think step-by-step,” we force it to generate its reasoning process as part of the output. Each new word it generates is conditioned on the reasoning steps it has already written down. This creates a logical sequence that guides the model toward a more accurate conclusion. Instead of trying to jump straight to the answer — which is difficult for complex problems — the model externalizes its thought process, allowing it to break the problem down and build upon its own intermediate conclusions. It’s like a student showing their work on a math problem; writing down the steps helps avoid errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4hx6ha1fcjoobwn29mm.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4hx6ha1fcjoobwn29mm.webp" alt="Example for Chain of Thought" width="720" height="134"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Least-to-most&lt;/strong&gt; — Prompt the LLM to decompose the problem and solve, easy-first&lt;/p&gt;

&lt;p&gt;This strategy builds on Chain-of-Thought by prompting the model to first break a complex problem into a series of simpler subproblems and then solve them in sequence. This is particularly useful for tasks where one step logically depends on the answer to a previous one. The key is to guide the model to tackle the easiest parts first, creating a foundation for solving the more difficult parts. This reduces the cognitive load and improves the chances of arriving at a correct final answer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Query:&lt;/strong&gt; “&lt;em&gt;If a car travels at 60 mph for 30 minutes and then gets stuck in traffic for 15 minutes before traveling at 40 mph for another 15 minutes, what is its average speed for the entire trip?&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Least-to-Most Prompting:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Prompt 1 (Decomposition):&lt;/strong&gt; “&lt;em&gt;Break down the problem of calculating the car’s average speed into smaller steps.&lt;/em&gt;”&lt;br&gt;
&lt;strong&gt;Model’s Response:&lt;/strong&gt; “&lt;em&gt;1. Calculate the distance traveled in the first leg. 2. Calculate the distance traveled in the second leg. 3. Calculate the total distance. 4. Calculate the total time. 5. Divide total distance by total time.&lt;/em&gt;”&lt;br&gt;
&lt;strong&gt;Prompt 2 (Solving):&lt;/strong&gt; “&lt;em&gt;Great. Now solve each step.&lt;/em&gt;”&lt;br&gt;
&lt;strong&gt;Model solves sequentially, leading to the correct average speed.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;&lt;strong&gt;Step-back&lt;/strong&gt; — Prompt the LLM to indentify high-level concepts pertinent to a specific task&lt;/p&gt;

&lt;p&gt;Step-Back prompting encourages the model to generalize and abstract away from the specific details of a question to consider the broader principles or concepts at play. Instead of getting bogged down by the specifics, the model is asked to “take a step back” and think about the fundamental knowledge required to answer the question. It then uses this high-level understanding to formulate a more robust and accurate answer. This is especially effective for complex reasoning tasks and for questions where the details might be misleading.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6qyosfwtes6zav8c68j.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6qyosfwtes6zav8c68j.webp" alt="Example for Step-back prompting" width="720" height="165"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Issues with Prompting
&lt;/h3&gt;

&lt;p&gt;Prompting models, while a powerful tool, carries several inherent risks that are important to understand and manage. These risks range from security vulnerabilities to ethical dilemmas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt Injection&lt;/strong&gt;&lt;br&gt;
Prompt injection is one of the most significant security risks. It is a form of attack where a malicious actor intentionally crafts a prompt to trick an AI model into ignoring its original instructions and developer guidelines. The model often cannot distinguish between developer instructions and user input, allowing an attacker to take control with a cleverly crafted command. For example, it can be manipulated to leak confidential data or generate misinformation and harmful content. This type of attack does not require advanced technical skills; it is carried out simply by deceiving the model using natural language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inherent Biases and Prejudices&lt;/strong&gt;&lt;br&gt;
Generative models operate based on the patterns present in their training data. If this data contains social, cultural, or other prejudices, the model will not only reproduce but can also amplify these biases. This can manifest, for instance, in the model consistently associating certain professions with a specific gender or reinforcing racial and cultural stereotypes. Such biased outputs can perpetuate discriminatory practices in areas like hiring or lending.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Misinformation and “Hallucinations”&lt;/strong&gt;&lt;br&gt;
Generative AI models are prone to confidently stating falsehoods. This phenomenon is known as “hallucination.” Since models fundamentally predict the next most likely word based on statistical patterns, their responses do not necessarily have a factual basis. This can be particularly dangerous when users rely on AI-generated content for critical decisions, such as for financial or medical advice. Malicious actors can also intentionally use these models to create convincing fake news and disinformation campaigns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Security and Privacy Risks&lt;/strong&gt;&lt;br&gt;
When users write prompts, they might inadvertently include confidential or personally identifiable information (PII). There is a risk that the model could memorize this information and later reveal it in a response to another user. The risk is especially high with cloud-based or third-party AI services, where input data may not be handled properly, leading to privacy breaches and legal violations (such as GDPR).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generating Malicious Content&lt;/strong&gt;&lt;br&gt;
Without proper safety restrictions, models can be used to create offensive, inappropriate, or illegal content. Attackers can generate malicious code or sophisticated phishing emails to target other users or systems, using the AI as a tool to scale their malicious activities.&lt;/p&gt;


&lt;h3&gt;
  
  
  Training
&lt;/h3&gt;

&lt;p&gt;While prompting is powerful, it can be insufficient when you need an LLM to become a true expert in a specific domain or perform a highly specialized task. This is where &lt;strong&gt;training&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;As opposed to prompting (which gives the model context), training &lt;strong&gt;permanently changes the model’s internal parameters&lt;/strong&gt;. Think of it as teaching the model a new skill, not just giving it notes for a single test. At a high level, the process involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Giving the model an input.&lt;/li&gt;
&lt;li&gt;Letting it guess the corresponding output (e.g., a sentence completion or an answer).&lt;/li&gt;
&lt;li&gt;Comparing its guess to the “correct” answer and slightly adjusting its parameters so it does better next time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These adjustments change how the model “thinks” and hopefully improve its performance on your specific task. There are several ways to do this, ranging from massive undertakings to highly efficient tweaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continual Pre-training: Expanding the Knowledge Base&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; This technique continues the initial, broad training of an LLM, but using a large corpus of text from a new, specialized domain (e.g., legal documents, medical research, or your company’s internal wiki). You are still just asking the model to predict the next word, but on this new, focused data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters Modified:&lt;/strong&gt; All of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Required:&lt;/strong&gt; A large amount of &lt;strong&gt;unlabeled&lt;/strong&gt; domain-specific text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; When the model lacks fundamental knowledge about a specific field. You aren’t teaching it a task, you are teaching it a subject.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full Fine-Tuning (FT): Teaching a Specific Skill&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; This is the classic way to train a model for a specific task. You take a pre-trained model and train it further on a dataset of examples that show exactly what you want it to do (e.g., thousands of question-answer pairs for a customer service bot).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters Modified:&lt;/strong&gt; All of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Required:&lt;/strong&gt; A high-quality, &lt;strong&gt;labeled&lt;/strong&gt;, task-specific dataset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Achieving the highest possible performance on a specific task when you have a large budget and a good dataset. However, it is computationally expensive and risks “catastrophic forgetting,” where the model loses some of its general capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Parameter-Efficient Fine-Tuning (PEFT): The Smart Middle Ground&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To get the benefits of fine-tuning without the enormous cost, several PEFT methods have emerged. The core idea is to freeze the original LLM’s billions of parameters and only train a small number of new or specific ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Method A: LoRA (Low-Rank Adaptation)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; LoRA is a popular PEFT method where small, trainable “adapter” layers are inserted into the model. The original model remains frozen, and only these tiny new layers are trained. It’s like adding specialized tuning knobs to a complex engine instead of rebuilding it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters Modified:&lt;/strong&gt; Only a tiny fraction of new, added parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Required:&lt;/strong&gt; Labeled, task-specific data (often less than full FT).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Method B: Soft Prompting (or Prompt Tuning)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; This technique focuses on the input. It freezes the entire model and instead learns a special “soft prompt” — a sequence of numerical values that are prepended to your actual prompt. You can think of these as perfect, computer-generated keywords that are learned during training to steer the model toward the correct output for your task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters Modified:&lt;/strong&gt; A small number of new parameters that represent the soft prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Required:&lt;/strong&gt; Labeled, task-specific data.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Training Style&lt;/th&gt;
&lt;th&gt;Parameters Modified&lt;/th&gt;
&lt;th&gt;Data&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cont. Pre-training&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;Unlabeled&lt;/td&gt;
&lt;td&gt;Very High&lt;/td&gt;
&lt;td&gt;Adapting to a new knowledge domain.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Fine-Tuning&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;Labeled&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Max performance on a specific task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PEFT (e.g., LoRA)&lt;/td&gt;
&lt;td&gt;Few (new)&lt;/td&gt;
&lt;td&gt;Labeled&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Cost-effective task specialization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soft Prompting&lt;/td&gt;
&lt;td&gt;Few (new)&lt;/td&gt;
&lt;td&gt;Labeled&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;Efficiently tuning for many tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  Decoding
&lt;/h3&gt;

&lt;p&gt;Decoding is the process an LLM uses to select words from a probability distribution to generate text. After the model processes an input, it doesn’t know the “right” word; it only knows the probability of every word in its vocabulary being the next one.&lt;/p&gt;

&lt;p&gt;Let’s get the example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I wrote to the zoo to send me a pet. They sent me a _______”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model produces a probability distribution over its entire vocabulary, which might look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“lion = 0.03”; “elephant = 0.02”; “dog = 0.45”; “cat = 0.4”; “panther = 0.05”; “aligator = 0.01”; …
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key question is: "how do we pick a word from this list?" This choice happens iteratively:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The model computes the probability distribution.&lt;/li&gt;
&lt;li&gt;A word is selected using a decoding strategy.&lt;/li&gt;
&lt;li&gt;The chosen word is appended to the input text.&lt;/li&gt;
&lt;li&gt;The process repeats until the model generates an end-of-sequence (EOS) token or reaches its maximum length.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are two main families of decoding strategies:&lt;/p&gt;

&lt;h4&gt;
  
  
  1.) Greedy Decoding: The Direct Path
&lt;/h4&gt;

&lt;p&gt;This is the simplest and most direct strategy. At each step, we simply pick the word with the &lt;strong&gt;highest probability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the example above, it selects the “dog” (0.45 probability).&lt;/p&gt;

&lt;p&gt;The next input becomes:&lt;br&gt;
“I wrote to the zoo to send me a pet. They sent me a dog ______”&lt;/p&gt;

&lt;p&gt;The model then generates a new distribution. Let’s say the highest probability is now for the End-of-Sequence token (EOS = 0.99). Greedy decoding selects it, and the generation stops.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Distribution: 
“EOS = 0.99”; “elephant = 0.001”; “dog = 0.001”; “cat = 0.001”; “panther = 0.005”; “aligator = 0.01”; …
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; Fast, predictable, and produces the most “likely” output.&lt;br&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Can be repetitive and boring. It might miss a more creative or coherent sentence by always choosing the locally optimal word, without considering the global sentence structure.&lt;/p&gt;

&lt;h4&gt;
  
  
  2.) Sampling: Introducing Controlled Randomness
&lt;/h4&gt;

&lt;p&gt;To produce more creative and human-like text, we can introduce randomness. Instead of always picking the top word, we sample from the probability distribution. Several parameters control this process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temperature&lt;/strong&gt;&lt;br&gt;
Temperature is the most important parameter for controlling randomness. It is a value typically between &lt;strong&gt;0.0 and 2.0&lt;/strong&gt;. It “re-shapes” the probability distribution before sampling.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low Temperature (e.g., 0.2):&lt;/strong&gt; This makes the distribution “peakier.” The probability of high-probability words (like “dog” and “cat”) gets boosted, while low-probability words are suppressed even further. As temperature approaches 0, it becomes identical to greedy decoding.
&lt;strong&gt;Use when:&lt;/strong&gt; You want factual, grounded, and predictable answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Temperature (e.g., &amp;gt; 1.0):&lt;/strong&gt; This “flattens” the distribution, making the probabilities of words more uniform. Rare words have a higher chance of being selected.
&lt;strong&gt;Use when:&lt;/strong&gt; You want creative, diverse, and sometimes surprising output, like for writing a story or brainstorming ideas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Top-K and Top-P (Nucleus) Sampling&lt;/strong&gt;&lt;br&gt;
These methods are often used with temperature to further refine the word selection. They prevent the model from picking truly nonsensical words by first filtering the vocabulary list.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Top-K Sampling: Consider only the K most likely words. For example, if K=3, you would only sample from “dog”, “cat”, and “panther”, ignoring all others.&lt;/li&gt;
&lt;li&gt;Top-P (Nucleus) Sampling: A more dynamic approach. You choose the smallest set of top words whose cumulative probability is greater than P. For example, if P=0.90, you would take “dog” (0.45) and “cat” (0.40), as their sum is 0.85. Since that’s less than 0.90, you’d add the next word, “panther” (0.05), bringing the total to 0.90. You then sample only from that small group.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Take away:&lt;/strong&gt;&lt;br&gt;
Decoding is a balance between coherence and creativity. For factual applications, use &lt;strong&gt;greedy decoding&lt;/strong&gt; or &lt;strong&gt;sampling with a low temperature&lt;/strong&gt;. For creative tasks, &lt;strong&gt;increase the temperature&lt;/strong&gt; and use &lt;strong&gt;Top-K or Top-P&lt;/strong&gt; to guide the randomness and prevent nonsensical outputs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Thanks for reading this article, if you have any questions feel free to reach me out!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The LLM application possibilities are coming soon, follow me to get notified.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>promptengineering</category>
    </item>
  </channel>
</rss>
