Forem: Piotr Borys

Android Daily Overview

Piotr Borys — Tue, 19 May 2026 20:47:46 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Idea and first try

I'm getting a lot of notifications on my Android phone. And... I don't like browsing through them ;)
So I thought: if we got a local (thus completely private) LLM - let's use it :)

At first, I've built a background service, collecting incoming notifications and categorizing them, using Gemma-4. And it worked. The issue is, it was a huge battery drainer...

A new approach

So, after few hours at the board, I came with an idea of having a light background service, collecting notifications and SMSes, and using a light MobileBERT model, vectorizing them into an ObjectBox database, with some categorizing. Then, only on-demand, from a dashboard of the main application, using Gemma-4 E4B model, processing all that stored info. That way it seems to be quite nice for my battery - and it works.

Data retention policy

Of course, using an intermediate database means I had to deal with data retention policy. I've made it a 4 categories policy:

Volatile - TTL 1 hour - or up to next report generation. Examples are 2FA codes, temporary tokens, OTPs, verification codes, etc.
Context Rolling Window - TTL 24h. Examples are weather info, news, commute info, stocks, etc.
Action-locked TTL - until action is completed or dismissed. Examples are calendar events, todos, meetings, etc.
Long-term knowledge - TTL 7 days. Examples: "my daughter new phone number", "mom will come to visit next Friday", etc.

For long-term knowledge, I've added a setting, so user can decide, how long to store this kind of information.

Action-locked items are presented to a user with possibility to either dismiss them, or add them to calendar or create an alarm. Once that action is taken, the item is marked to be removed from database. These actions call proper intents to underlying system apps, prefilling it with available info, like time, date, title, etc.

Architecture

Demo

Short video demo:

Few screenshots:

Code

The code can be found on my GitHub. It's far from being finished and polished - but it's working. Parts of it were created using Antigravity with Gemini.

How I Used Gemma 4

All the incoming information, after being vectorized and embedded into ObjectBox database using MediaPipe MobileBERT, is being processed by Gemma-4 E4B. The model is run locally on device, using LiteRT-LM engine.

Gemma-4 E4B is great for this case - it's fast enough to provide summary and action extraction in seconds on my Galaxy S24 Ultra (including model loading time), while still being small enough to run locally on high end devices. For smaller ones E2B might be a better choice. On first run, it allows downloading the model from a given link.

Prompts for LLM

General main prompt:

You are a personal assistant. Below is a list of notifications and messages.
Provide a concise summary and suggested actions.

Context: [CONTEXT]

Prompt for information extraction from SMSes:

Extract action items (invoices, appointments, tasks) from this SMS.
Today is $dateString.
SMS: "$content"

Categorization Rules:
- If it's an invoice, appointment, or has a specific deadline, use "CALENDAR".
- If it's a general todo without a date, use "TASK".
- If it's an immediate wake-up/reminder, use "ALARM".

Retention Rules:
- "VOLATILE": Temporary tokens, OTPs, short-lived verification codes.
- "ROLLING": Time-sensitive context (weather, generic traffic, stock updates).
- "ACTION_LOCKED": Invoices, appointments, tasks (anything requiring user action).
- "LONG_TERM": Facts, persistent knowledge, information worth keeping for weeks.

Respond ONLY with a JSON object:
{
    "hasAction": boolean,
    "type": "TASK" | "CALENDAR" | "ALARM",
    "title": "Short title",
    "dueDate": timestamp_ms,
    "isExpired": boolean,
    "retention": "VOLATILE" | "ROLLING" | "ACTION_LOCKED" | "LONG_TERM"
}
If no action found, set hasAction to false, but ALWAYS provide the "retention" field.

Sound analysis for visualization

Piotr Borys — Mon, 27 Apr 2026 11:48:50 +0000

Last time I was working on sound visualization, after testing with real-life data (yes, music 😉) and testing out various visualization shaders, I came to a conclusion that I approached it from a too scientific point of view.
The result was fully proper spectrogram - but not so useful for visualization purposes.
So, now I've returned to it - but this time focusing on achieving more visually appealing results,
easier to read by a human. I wanted to make it similar to what Inigo Quilez is doing in his ShaderToy, but I couldn't find an exact way he is treating the data, so I had to come up with my own approach.

One thing still applies: the best way is to use a frequencies analysis, through FFT. Waveform itself can be useful, too (and that's why I'm still including it as a second row of my OpenGL texture), but here we will focus on a spectrogram, as there's not too much to talk about a waveform, it's a simple data.

So, let's start from taking a portion of an audio file:

import numpy as np
import librosa

def get_audio_part(audio, time_start=0.0, sample_rate=44100, num_samples=512):
    sample_start = int(time_start * sample_rate)
    sample_end = sample_start + num_samples

    # Handle padding if we reach the end of the audio
    if sample_end > len(audio):
        audio_part = audio[sample_start:]
        audio_part = np.pad(audio_part, (0, num_samples - len(audio_part)), 'constant')
    else:
        audio_part = audio[sample_start:sample_end]

    return audio_part

audio_, sample_rate_ = librosa.load("test_sound_01.mp3", mono=True, sr=None)
position = 0.0 # position in the audio file
signal = get_audio_part(audio_, position, sample_rate_, 2048)

Now we can perform a regular FFT analysis of frequencies. We will use Hann window filtering.

window = np.hanning(len(signal))
windowed_signal = signal * window
freqresp = np.fft.rfft(windowed_signal)

freqs = np.fft.rfftfreq(len(signal), 1/sample_rate_)

plt.figure(figsize=(12, 5))
plt.plot(freqs, np.abs(freqresp), color='#00aaff', linewidth=1.5)
plt.title("Frequency Spectrum (FFT Analysis)", fontsize=14, fontweight='bold')
plt.xlabel("Frequency (Hz)", fontsize=12)
plt.ylabel("Magnitude", fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)

plt.xlim(0, sample_rate_ / 2)
plt.tight_layout()
plt.show()

To better understand what's happening there, let's move it to dB scale:

magnitude_db = 20 * np.log10(np.abs(freqresp) + 1e-9)
plt.figure(figsize=(12, 5))
plt.plot(freqs, magnitude_db, color='#00aaff')
plt.title("Frequency Spectrum (dB Scale)")
plt.xlabel("Frequency (Hz)")
plt.ylabel("Magnitude (dB)")
plt.grid(True, alpha=0.3)
plt.show()

As we can see, the values range is huge. Keeping in mind we will be mapping them to an image, encoding the magnitude to pixel's brightness, we will get few frequencies bright, and most of the rest just pitch black.

So, we need to make it less scientific - and more visually pleasing. We'll rescale the values - flatten them, to make it more image-friendly.

We'll start from logarithmic scaling (adding 1.0 to avoid values going to negative infinity) and then remapping them to a 0..1 range:

magnitude = np.abs(freqresp)
magnitude = np.log10(magnitude + 1.0)
magrange = np.max(magnitude) - np.min(magnitude)
magnitude -= np.min(magnitude)
magnitude /= magrange

plt.figure(figsize=(12, 5))
plt.plot(freqs, magnitude, color='#ff5500')
plt.title("Magnitudes rescaled for better visibility, with flattened range.")
plt.xlabel("Frequency (Hz)")
plt.ylabel("Magnitude (rescaled)")
plt.grid(True, alpha=0.3)
plt.show()

Also, we will take only first 512 values from our FFT response. Remembering we took a 2048 window, FFT returned 1024 values, so our first 512 values will be representing 0..11025 Hz.

Let's build our final texture. It will be 512 pixels wide. In fact should be 1 pixel high, but here we will use 100px, to better see it. It will use only one channel, RED.

ℹ️ Note:
When creating OpenGL texture, we have to keep in mind the texture has to be created only once (for instance, on music load), and then in each video frame just having the data being replaced. Also it shouldn't have any mip-mapping.

from PIL import Image

array = magnitude[:512]
arrayuint8 = array.astype(np.float64)
arrayuint8 = 255 * arrayuint8
img = Image.fromarray(arrayuint8.astype(np.uint8), mode='L')
zero = np.zeros(array.shape, dtype=np.uint8)
img_zero = Image.fromarray(zero, mode='L')
img = Image.merge(mode='RGB', bands=(img, img_zero, img_zero))
img = img.rotate(90, expand=True)
img = img.resize((512, 100))
display(img)

You can see an example of working texture below, in animated form:

In final visualizer, we will also add a second row of data, representing a waveform of the audio part, but that is pretty straightforward.

Creating a simple local RAG system

Piotr Borys — Sun, 29 Mar 2026 12:35:26 +0000

We'll build a simple RAG system using local only models. We will not use LangChain, which is introducing many bloated dependencies, is much slower than direct Transformers usage, is not error-free and its documentation is mostly misleading. We'll use only bare Transformers functions for that. As a vector database for storing our embeddings from document, we'll use Faiss, which is really efficient in similarity search. Note it sits in RAM, not on a disk and is very fast.

What is a RAG?

Retrieval-Augmented Generation (RAG) is an AI framework that improves Large Language Model (LLM) accuracy by retrieving data from external, trusted sources (documents, databases) rather than relying solely on training data. It enables up-to-date, specialized answers, reduces hallucinations, and avoids costly model retraining.

In simple words: it allows to have a LLM having a specialized knowledge without retraining it. We'll build here a simple version of it, allowing loading a single PDF files and then having a chat. We will use only local models, without using any cloud. This means few things:

it's completely free
it's completely private (no data exposing to internet)
it's weaker than cloud models.

Models of choice

It's up to you - and depends mostly on your hardware (GPU and its VRAM). I've used here google/gemma-2-9b-it as LLM and BAAI/bge-large-en-v1.5 for creating embeddings. It works without any issues on 12GB VRAM GPU - and it works with different languages. You can, for instance, have a source in Polish and ask questions in English (or vice-versa). Keep in mind in order to use models from HuggingFace, you have to have an account there and you have to accept the model's usage policy.

Some important parameters

When creating a RAG system, there're few parameters, that can have a big impact on how it's working. This includes:

LLM's temperature: it controls the randomness of LLM's output. The lower temperature, the more deterministic answers will be.
If LLM can do sampling: if sampling is set to off, LLM is using greedy sampling, so it just selects the tokens with the highest probability. If sampling is allowed, it will take one of the possible tokens. The choise is weighted, but it doesn't necessarily mean the highest probability will be chosen.
How many similar chunks to choose: during looking for similar chunks in the vector database, how many of them will be selected for the answer? In this example well working values are 3-5.

All in all, how you set them, depends mostly on the type of documents you want to work with: if it's some theory, reports, technicals, instructions, guides etc., then set it as I did below. If it's more loose texts, you may want to increase the temperature (let's say, up to 0.4) and turn sampling on. With even more informal texts you will also want to increase the number of similar chunks to be found (even above 10), but be careful, it will greatly increase the RAM usage.

Let's gather it up:

EMBEDDING_MODEL = "BAAI/bge-large-en-v1.5"
LLM_MODEL = "google/gemma-2-9b-it"
LLM_TEMPERATURE = 0.1
LLM_DO_SAMPLE = False
SIMILAR_CHUNKS_COUNT = 3

Building a vector database

Let's start from creating our vector database, which is our special knowledge, created from given PDF file. We will read the file and split the text into smaller chunks (remembering the page number for each chunk, so we can give exact citations in our answers). The chunks will be converted into embeddings, using SentenceTransformer and our embedding model.

reader = pypdf.PdfReader(pdf_path)
full_text = ""
pages_meta = []  # A list to track which page each character comes from.
for i, page in enumerate(reader.pages):
    page_text = page.extract_text()
    if page_text:
        full_text += page_text
        # For each character on the page, store its page number and source file. This is a bit memory-intensive
        # but allows for accurate source tracking later.
        pages_meta.extend([{'page': i + 1, 'source': pdf_path}] * len(page_text))

Having the text extracted, we'll split the full text into smaller chunks:

chunks = simple_text_splitter(full_text, chunk_size=800, chunk_overlap=150)

We'll create a metadata for each chunk by finding the page number corresponding to the middle of the chunk:

chunk_metadatas = []
char_count = 0
for chunk in chunks:
    mid_point = char_count + len(chunk) // 2
    if mid_point < len(pages_meta):
        chunk_metadatas.append(pages_meta[mid_point])
    else: # A fallback for the very last chunk.
        chunk_metadatas.append(pages_meta[-1])
    char_count += 800 - 150 # Move character counter forward by (chunk_size - chunk_overlap).

Now we can create embeddings for each text chunk:

embedding_model = SentenceTransformer(EMBEDDING_MODEL)
embeddings = embedding_model.encode(chunks, convert_to_tensor=True, show_progress_bar=True)
embeddings = embeddings.cpu().numpy().astype('float32') # FAISS requires float32 numpy arrays.

Normalize the embeddings to unit length. This is necessary for using the Inner Product (IP) as a measure of cosine similarity:

faiss.normalize_L2(embeddings)

Let's populate the FAISS vector store:

index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)  # Add the chunk embeddings to the index.

and pack all of this into a handy structure for later use:

return {
    "index": index,
    "chunks": chunks,
    "metadatas": chunk_metadatas,
    "embedding_model": embedding_model
}

Feeding this structure into function below (as a db parameter), we can have a very simple and straightforward searching mechanism:

def search_vector_db(db, query, k):
    query_embedding = db["embedding_model"].encode([query], convert_to_tensor=True)
    query_embedding = query_embedding.cpu().numpy().astype('float32')
    faiss.normalize_L2(query_embedding)

    distances, indices = db["index"].search(query_embedding, k)
    retrieved_chunks = [db["chunks"][i] for i in indices[0]]
    retrieved_metadatas = [db["metadatas"][i] for i in indices[0]]

    return retrieved_chunks, retrieved_metadatas

query is a query. k is the number of chunks to be found.

Before we can use this for searching, we have to have a query first, so let's setup a simple chat with LLM.

Chat with LLM

Loading the LLM is pretty straightforward, if we remember the notes we stated in the beginning (the temperature etc.).
We will load quite a big model, but quantize it to 4-bit precision, significantly reducing the size in RAM:

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    LLM_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    low_cpu_mem_usage=True
)
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    max_length=None,
    temperature=LLM_TEMPERATURE,
    repetition_penalty=1.1,
    do_sample=LLM_DO_SAMPLE,
    return_full_text=False
)
return pipe

Preparing the interactive loop to chat

Pretty much everything below will be closed inside the loop, so we can have a chat:

while True:
    query = input("\nAsk a question (type 'exit' or 'quit' to quit): ")
    if query.lower() in ['exit', 'quit']:
        break

Searching the vector database

First thing to do after getting a query from a user, is to find relevant chunks of text, along with their page numbers, which we will use in sources citation.

context_chunks, context_metadatas = search_vector_db(db, query, k=SIMILAR_CHUNKS_COUNT)
context = "".join(context_chunks)

Having the relevant parts of the text, we have to prepare the LLM's part of the job.

Preparing the prompt template

This part will differ slightly depending on the LLM model used. Some models expect different prompt templates, regarding the user -> assistant loop:

template = f"""<start_of_turn>user
You're a helpful assistant. Answer the question based only on the context below.
Answer using the same language the question was asked in.\n
Context:\n
{context}\n
Question: {query}<end_of_turn>
<start_of_turn>model
"""

If we don't formulate this properly, model can start hallucinating the dialog and start talking to itself. Note we have passed not the text, but our chunks that have been found after the user's question.

Finally we can pass it to LLM and get the answer.

result = llm_pipeline(template)
answer = result[0]['generated_text'].strip()
print(answer)

Sources citation

We can also quote the pages containing the relevant material in the PDF:

seen_pages = set() # Use a set to avoid printing duplicate page numbers.
for meta in context_metadatas:
    page_num = meta.get('page', 'N/A')
    if page_num not in seen_pages:
        print(f"  - Page: {page_num} (Source File: {meta.get('source')})")
        seen_pages.add(page_num)

Final thoughts

As you can see above, it's very simple, much simpler than one could think: it's not LLM looking for the answer.
It's FAISS looking through all our chunks of text and finding the most suiting ones. LLM is given only those found ones and all it's doing is recapping the small portion of the text and formulating nice text. Simple as that.

Now, your task now is to put it into some UI, for example a simple Streamlit which is perfectly suiting this type of job.
Just add some button to load the PDF, input and text - and voilà :)

Diagramify - automatic diagram creation for Notion

Piotr Borys — Mon, 16 Mar 2026 13:35:09 +0000

I'm often writing descriptions and ideas on projects in Notion. And I thought: it'd be nice to have such system descriptions summarized in a diagram of proper kind, be it a flow diagram, decision diagram etc. And it'd be nice to have it done automagically.

So, I've built a simple Notion integration. It's made of 2 parts: a server, acting like a Notion MCP server, connecting to api.notion.com, and a client using this server. That way I could omit using a full-blown official Notion MCP server, requiring full OAuth authorization, playing with tokens etc. It's just a simple tool running on my own local machine, connecting directly to API.

How it works?

The server part is mimicking what official MCP server is doing, but it's adapted to my needs, like if it gets a page, which is long, it gets care of getting all the parts, etc. That way the client itself is much easier to write and maintain.
Both server and client are written in Python. All you need to do is to create an internal integration on Notion Integrations page. From there, take your Notion token and put it in .env file:

NOTION_TOKEN = "YOUR_NOTION_TOKEN"

Server

Server is using Starlette and MCP modules as its main core, using SSE Transport. And yes, it should be updated to use HTTP Stream Transport instead, but I've left that for some day.
Now, the server has a number of tools the client can use. My list of tools is different from the original Notion MCP server, as it's suited for my task. For this simple integration it's:

list_accessible_pages: gets all the pages that were connected to our integration in Notion UI;
get_notion_object: gets the object and its info. If it's a big object, get all the blocks.
upsert_mermaid_block: inserts or updates the Mermaid block.

Client

Now, how it's working?
As the core functionality is automagic diagram insertion, I'm just using Gemini 2.5 Flash AI model to generate the diagram, feeding it with the whole text of the page.
Inserted diagram is timestamped, so the client can know if the page was updated afterwards and needs refreshing of the diagram.

The exact flow of the client is:

get the list pages it has access to;
download whole page;
search for a signature with a timestamp;
if the signature doesn't exist, or the timestamp is older than the update date of the page: insert or update the mermaid block, the code itself comes from Gemini.

Simple as that.

There's also one thing to consider, while creating Mermaid diagrams with Gemini: while it's really good at it, it can sometimes do syntax errors, especially in complicated diagrams. That's why my prompt lists few rules.

Another thing is a hard limit of a single block length in Notion. It's only 2000 characters. It sounds as enough, but for large complicated diagrams it can be too small. Hence my prompt says exactly that and adds "if it's too long, simplify the diagram and omit less important elements".

Automating the run

I'm using Windows. For such tasks I like using the PM2. All it needs is creating a simple config file:

module.exports = {
    apps : [
    {
        name: "notion-server",
        script: "notion_server.py",
        interpreter: "./PythonEnv/python.exe",
        autorestart: true,
        watch: false,
        env: {
            NODE_ENV: "production",
            PYTHONIOENCODING: "utf-8",
            PYTHONUTF8: "1"
        }
    },
    {
        name: "notion-worker",
        script: "notion_client_local.py",
        interpreter: "./PythonEnv/python.exe",
        autorestart: true,
        restart_delay: 300000,
        watch: false,
        env: {
            NODE_ENV: "production",
            PYTHONIOENCODING: "utf-8",
            PYTHONUTF8: "1"
        }
    }
  ]
}

With this config, it will take care of running the server and running the client every 5 minutes.

The sources

All the sources are available on GitHub, just keep in mind it's in Polish (both comments and Gemini's prompt), but I think you can easily get the grasp of what it's doing. The code is really simple.

A Deep Dive into Grid Removal for OCR

Piotr Borys — Mon, 09 Mar 2026 00:20:53 +0000

If you’ve ever tried to run OCR on handwritten notes, you know the struggle. Standard algorithms excel at clean, black-on-white typed text. But throw in a background grid or low-contrast pencil marks, and the accuracy plummets.

While some papers can be quite nice for text recognition, some may be... let's say - hard.

The grid on this paper is of almost the same intensity as writing.

The following Python function uses OpenCV to perform "surgery" on an image: it identifies the grid, removes it without destroying the text, and then uses adaptive equalization to make the handwriting pop. Let’s break down how it works step-by-step.

1. Thresholding: Creating a Binary World

The process starts by converting the image to grayscale and applying an Adaptive Threshold.

img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
thresh = cv2.adaptiveThreshold(img, 255, 
    cv2.ADAPTIVE_THRESH_MEAN_C, 
    cv2.THRESH_BINARY_INV, 
    41, 5)

Unlike global thresholding (which uses one value for the whole image), adaptive thresholding calculates different thresholds for small pixel neighborhoods.

Why? Scanned documents often have uneven lighting (shadows in the corners).
The Result: We get a "binary" image (black and white) where the grid and text are white and the background is black (THRESH_BINARY_INV). This makes it easier for the math in the next step to identify shapes.

2. Morphological Operations: Isolating the Grid

Now we need to tell the computer what a "grid line" looks like. We use Structuring Elements (kernels).

scale = 40
hor_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (img.shape[1] // scale, 1))
ver_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, img.shape[0] // scale))
mask_h = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, hor_kernel, iterations=1)
mask_v = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, ver_kernel, iterations=1)
grid_mask = cv2.add(mask_h, mask_v)

We create two long, thin rectangles: one horizontal and one vertical. By applying a Morphological Open operation, we effectively say: "Keep only the shapes that this rectangle can fit into."

mask_h: Only keeps horizontal lines.
mask_v: Only keeps vertical lines.
grid_mask: By adding them together and dilating (thickening) the result, we create a map of exactly where the grid sits.

We can dilate it by few pixels, to be sure it won't leave any artifacts after removal.

grid_mask = cv2.dilate(grid_mask, np.ones((5,5), np.uint8), iterations=1)

3. Inpainting: The "Content-Aware Fill"

Simply "deleting" the grid would leave white scars through your letters. Instead, we use Inpainting.

result_inpainted = cv2.inpaint(img, grid_mask, 3, cv2.INPAINT_TELEA)

Inpainting looks at the grid_mask (the areas we want to fix) and fills those pixels by interpolating data from the surrounding non-grid pixels. It’s like a smart "heal" tool. It removes the grid while attempting to preserve the continuity of the pen strokes that crossed over it.

Looks like magic :)

4. CLAHE: Bringing Back the Contrast

Finally, we deal with legibility. Handwritten text is often faint. We use CLAHE (Contrast Limited Adaptive Histogram Equalization).

clahe = cv2.createCLAHE(clipLimit=2.5, tileGridSize=(8, 8))
enhanced = clahe.apply(result_inpainted)

Standard Histogram Equalization spreads out the most frequent intensity values, but it often over-amplifies noise. CLAHE operates on small tiles (8x8 pixels) and clips the contrast to prevent the background noise from becoming overwhelming.

After this step it could also be eroded (thickened) by some small kernel, but in the case of small tight writing it could destroy the visibility of individual letters.

kernel = np.ones((2, 2), np.uint8)
thickened = cv2.erode(enhanced, kernel, iterations=1)

The Result

By the end of this pipeline, the image has undergone a massive transformation:

Gridlines are intelligently "healed" out of the image.
Shadows from the scan are neutralized.
Faint handwriting is darkened and sharpened.

This pre-processed image provides a much higher "signal-to-noise" ratio, giving your OCR engine a clear path to accurate character recognition.

Now, when it comes to recognition itself, is another story. For this type of writing, as in this example (yes, it's mine...), there's no self-hosted solution, everyone of them is failing very miserably. For such bad writing only big cloud vision models can help. Why? Not only because it looks bad - but also it's so tight every self-hosted algorithm of separating this into individual lines fails, just like that. Doesn't matter if we use some clever engineering or let some vision AI to do it. Only the biggest models can do that ;) Of course, if you have some better papers, written in a more.. let's say, human way ;) then maybe there's a solution for self-hosted recognition. I've spent several nights on this and in the end I went for Gemini ;)

Use of AI in a job search

Piotr Borys — Wed, 25 Feb 2026 22:17:45 +0000

AI is being used in recruitment widely and everybody knows that. That led me to think: what if we do it the opposite way? Give a user the power of AI in his job search? Why and what for? For two reasons: first and obvious: let him find a better suited job. But there's also another thing: by studying the results we could see what in the user's profile made a given job offer to be suitable or not? Maybe there can be something changed in the profile? Because... that's also probably the same thing the recruitment's AI system is observing. So, in this way, user could make his profile better.

Now, the whole system I've made consists of 4 stages: web crawling, offer analysis, profile analysis and finally job matching. The final output is a proposal of a decision (apply, reject, maybe), summary of pros and cons and some thoughts and reasoning.

Web crawling is a huge thing and I won't discuss it here, as it's way too big for this article. I'll just mention it requires analysing the pages, deciding if it's even related, if it's a job offer or maybe a job offers list. It should analyze the links inside and find consecutive subpages. The output should be pairs of: url of an offer and its full text, without any HTML tags etc., just a pure text. All the other elements of this system we'll discuss below.

Technical stuff

Before we start diving into details of each of the stages, let's talk for a moment about technology used here. I've decided to make it as simple as possible - yet very easy to understand - and easy to run on user's home machine. That's why for provider of LLM models I chose Ollama, very handy system for self hosted LLMs. Coding will be done in Python (I'm using Python 3.12) using only elementary additional packages, like json, PdfReader, tqdm, requests and ollama. For communicating with LLMs through Ollama I've made a simple class with two types of clients: the one using ollama implementation and the one communicating through HTTP JSON API. You can see this little library on GitHub. It also contains few examples on how to use it in various scenarios. In this project we will use the structured JSON way.

Some general thoughts on how to utilize AI in similar jobs

The key is a very precise system prompt, defining how the model should act and what it should produce as an output. Something like "You are a job offer analyzer ... Return the output as JSON with the following fields ... do not add any explanations. Output only JSON ...". Also setting a model's temperature to 0 helps getting highly consistent, deterministic, and focused output. Another thing is to granulate the overall job to as small steps as possible. This is what we can see on above diagram - do not try to do few steps in one go, give model a single and precise job to do.

Offer analysis

The input to this stage is a pair of info: an url of an offer and its full text, stripped from any HTML tags etc - just a text. As an output we want a structured info, containing all the relevant info, like title, salary, type of job, requirements, nice-to-have etc. Let's see an example:

{
    "url": "https://great.company.com/job-offer",
    "company": "Great Company",
    "title": "Tech Lead",
    "location": "100% remote",
    "remote": true,
    "seniority": "lead",
    "salary": "",
    "description": "Provide technical leadership to the delivery team, be accountable for delivering defined feature sets, design and develop components within the data and analytics layer of an investment research platform, co-create system architecture and technology standards, ensure solution quality through code reviews, mentoring, and oversight of engineering practices, support the Product Owner and work closely with the team in backlog planning and execution, actively contribute to the development of analytical tools for investment analysts, participate in R&D work related to future iterations of the platform.",
    "responsibilities": [],
    "requirements": [
        "proven experience in providing technical leadership and acting as a Tech Lead in enterprise scale projects",
        "expert knowledge of agile software delivery and DevOps across the SDLC",
        "strong mentoring and coaching skills, including implementing engineering, architecture, and testing best practices",
        "experience in initiating and driving continuous improvement initiatives",
        "ability to work closely with Product Owners, stakeholders, and business users",
        "English proficiency at a minimum B2+ level"
    ],
    "nice_to_have": [],
    "technologies": [
        "agile software delivery",
        "DevOps",
        "SDLC"
    ],
    "offer": [],
    "language": "en"
}

Most important thing (as in whole this system) is a system prompt, making sure AI will return a precise output, formatted as we need:

system_prompt = """
You are a job offer analyzer. 
Extract structured information from the job description text. 
Return the output as JSON with the following fields:

- url: URL of the job offer
- company: infer company name if possible from the text or URL
- title: job title
- location: city / country / "remote"
- remote: true / false / "unknown"
- employment_type: full-time / contract / internship / unknown
- seniority: junior / mid / senior / lead / manager / unknown
- salary: salary range if specified
- description: short summary in your own words
- responsibilities: list of main responsibilities
- requirements: list of key requirements
- nice_to_have: list of additional "nice to have" skills
- technologies: programming languages, frameworks, tools
- offer: list of benefits, if available
- language: language of the job offer (en/pl)

Do not add any explanations. Output only JSON.
Never invent jobs that are not clearly present.
Never hallucinate technologies.
If a field is missing, use empty string, empty array, or "unknown".
"""

Then, the function itself is really simple if we'll use the ollama library I've quoted above:

def extract_job_offer(url, text):
    llm_client = LLMClientOllama()
    llm_client.set_model("qwen3:14b")
    llm_client.set_temperature(0)
    llm_client.set_json_format(True)

    data, role = llm_client.call_llm(f"URL: {url}\n\nText:\n{text}", system_prompt)
    return data

call_llm method from LLMClientOllama class will take care of proper JSON payload sending, receiving and extracting from the response.

💡
For all the structuring/extraction jobs I've chosen the QWEN3 model, but you can check other models. Specifically, QWEN3 14B is running smoothly on nVidia with 12GB of VRAM.

User's profile creation from CV

Before we can match the offer, we need to have a second side - the user's profile. Again, it will be a structured JSON. We could do it manually, but we'll make a PDF extractor, as headhunting systems are doing. That can give us a feedback on how well our CV is composed.

First step is extracting all the text from CV. This already gives a hint: DO NEVER send PDFs made of graphics (scanning or other composition tools) - it has to be a real text (not rendered). For this task we'll use PdfReader class from pypdf package.

from pypdf import PdfReader

def extract_text_from_pdf(pdf_path: str) -> str:
    reader = PdfReader(pdf_path)
    pages = []

    for page in reader.pages:
        text = page.extract_text()
        if text:
            pages.append(text)

    return "\n".join(pages)

Then we prepare the system prompt:

CANDIDATE_SYSTEM_PROMPT = """
You are an expert technical recruiter and career analyst.

Your task is to analyze a CV (resume) and extract a structured candidate profile.

Rules:
- Output ONLY valid JSON
- Do NOT include explanations, markdown, comments or prose
- If some information is missing, infer conservatively or use null
- Normalize names (e.g. "C plus plus" → "C++")
- Seniority must be one of:
  ["junior", "mid", "senior", "lead", "staff", "principal", "staff / principal / lead", "manager", "director", "cto", "ceo", "unknown"]

The JSON schema MUST match exactly:

{
  "seniority": string,
  "years_of_experience": number,
  "primary_roles": string[],
  "core_languages": string[],
  "secondary_languages": string[],
  "domains": string[],
  "leadership": {
    "people_management": boolean,
    "tech_lead": boolean,
    "scrum_master": boolean
  },
  "cloud": string[],
  "devops": string[],
  "frontend_level": string,
  "remote_preference": boolean,
  "languages_spoken": { "pl": string, "en": string },
  "job_preferences": {
    "roles_to_avoid": string[],
    "preferred_roles": string[]
  }
}

Think carefully. This profile will be used for automated job matching.
"""

and a small function preparing a user prompt:

def build_candidate_prompt(cv_text: str) -> str:
    return f"""
Analyze the following CV and extract the candidate profile.

CV TEXT:
----------------
{cv_text}
----------------
"""

Now we're ready to build a profile from CV:

def build_candidate_profile_from_cv(pdf_path: str) -> dict:
    cv_text = extract_text_from_pdf(pdf_path)

    llm_client = LLMClientOllama()
    llm_client.set_model("qwen3:14b")
    llm_client.set_temperature(0.2)
    llm_client.set_json_format(True)

    profile, llm_role = llm_client.call_llm(build_candidate_prompt(cv_text), CANDIDATE_SYSTEM_PROMPT)
    return profile

Of course the resulting JSON data can be modified to tweak it, but also it could be used to verify if maybe something should be added to the original CV instead.

Algorithmic match

It would be tempting to do all matching job using AI, but there're at least two points against it:

every AI call takes time - and if we can avoid it with some obvious rejects, it's a plus;
collecting some info, categorizing it etc. will be done better in simple code, as AI may sometimes hallucinate things.

That's why as a first step in job matching we'll perform some algorithmic data collection and first decision.

What you can do in this step, of course depends on the kind of a job, but for software developers you could score things like tech stack, seniority, domains, leadership duties, and general logistics. Let's see an example:

def normalize_token(s: str) -> str:
    return s.lower().replace(" ", "").replace(".", "")

def extract_technologies_from_offer(offer) -> set[str]:
    required_text = " ".join(offer.get("requirements", [])).lower()
    optional_text = " ".join(offer.get("nice_to_have", [])).lower()

    known_tech = {
        "c++", "c#", "python", "java", "javascript", "typescript",
        "golang", "rust", "swift", "kotlin", "scala",
        "node.js", "react", "angular", "docker", "kubernetes",
        "aws", "azure", "gcp", "qt", ".net", "opengl", "vulkan", "webgl", "webassembly",
        "postgresql", "mysql", "mongodb", "redis", "elasticsearch", "postgres", "mongo",
        "gitlab", "git", "GitLab CI", "ci/cd", "github actions", "github"
    }

    found_required = set()
    found_optional = set()
    for tech in known_tech:
        if tech in required_text:
            found_required.add(tech)
        if tech in optional_text:
            found_optional.add(tech)

    return found_required, found_optional

def score_tech_stack(profile, offer):
    offer_required_tech, offer_optional_tech = extract_technologies_from_offer(offer)

    profile_tech = {
        normalize_token(t)
        for t in profile["core_languages"] + profile["secondary_languages"]
    }

    strengths = []
    gaps = []
    score = 0

    for tech in offer_required_tech:
        if normalize_token(tech) in profile_tech:
            strengths.append(tech)
            score += 10
        else:
            gaps.append(tech)
            score -= 5

    for tech in offer_optional_tech:
        if normalize_token(tech) in profile_tech:
            strengths.append(tech)
            score += 5

    return max(min(score, 30), 0), strengths, gaps

Whatever you do, in the end of this step you should have something like this as a result:

{
    "decision": "reject",
    "score": 50,
    "strengths": [
        "java",
        "python",
        "c++"
    ],
    "gaps": [
        "javascript"
    ]
}

You should set the overall score levels for taking a decision, depending on how you've set scoring. Set 3 levels of decision: apply / maybe / reject - and filter out the rejected ones before passing the offers to the next step, which is AI matching.

AI match

Now it's time for the last step - the key one. Its input would be an offer, user's profile and algorithmic matching results, so it can learn from it.

Let's start from a system prompt:

REVIEW_SYSTEM_PROMPT = """
You are a senior technical recruiter and staff-level software engineer.

Your task is to evaluate whether this job offer is worth applying to
for experienced software engineer with attached profile.

You MUST be critical and skeptical.
Reject roles that are:
- execution-only
- lacking ownership or technical impact

Return ONLY valid JSON.
Do NOT include markdown.
Do NOT include explanations outside JSON.

JSON schema:

{
  "final_verdict": "apply" | "maybe" | "reject",
  "confidence": 0-100,
  "key_reasons": [string],
  "risks": [string],
  "positive_signals": [string],
  "summary": string
}
"""

Of course it should be altered to suit your needs.

Next, let's prepare a user prompt:

def prepare_llm_input(profile: dict, offer: dict, match: dict) -> dict:
    """
    Builds a clean, stable input structure for LLM evaluation.
    No formatting, no text generation.
    """

    return {
        "candidate": {
            "seniority": profile.get("seniority"),
            "years_of_experience": profile.get("years_of_experience"),
            "core_stack": profile.get("core_languages"),
            "secondary_stack": profile.get("secondary_languages"),
            "domains": profile.get("domains"),
            "leadership": profile.get("leadership"),
            "preferences": {
                "remote": profile.get("remote_preference"),
                "preferred_roles": profile.get("preferred_roles"),
                "roles_to_avoid": profile.get("roles_to_avoid"),
            }
        },

        "job": {
            "title": offer.get("title"),
            "location": offer.get("location"),
            "responsibilities": offer.get("responsibilities", [])[:10],
            "requirements": offer.get("requirements", [])[:10],
            "nice_to_have": offer.get("nice_to_have", [])[:5]
        },

        "algorithmic_assessment": {
            "score": match.get("score"),
            "decision": match.get("decision"),
            "strengths": match.get("strengths", []),
            "gaps": match.get("gaps", []),
            "red_flags": match.get("red_flags", [])
        }
    }

def build_llm_prompt(llm_input: dict) -> str:
    """
    Converts structured LLM input into a readable prompt.
    """

    return f"""
Candidate profile:
{json.dumps(llm_input["candidate"], indent=2, ensure_ascii=False)}

Job offer:
{json.dumps(llm_input["job"], indent=2, ensure_ascii=False)}

Algorithmic assessment:
{json.dumps(llm_input["algorithmic_assessment"], indent=2, ensure_ascii=False)}

Evaluate realistically whether applying makes sense.
"""

Now we can call it:

def review_offer_llm(profile: dict, offer: dict, match: dict) -> dict:

    llm_data = prepare_llm_input(profile, offer, match)
    llm_input = build_llm_prompt(llm_data)

    llm_client = LLMClientOllama()
    llm_client.set_model("qwen3:14b")
    llm_client.set_temperature(0)
    llm_client.set_json_format(True)

    data, role = llm_client.call_llm(llm_input, REVIEW_SYSTEM_PROMPT)

    return data

For this last step you could experiment with various models, as they can give slightly different reasoning. While the overall results will be probably similar, the reasoning can be very helpful for user can react and either take his action or maybe update his/her CV according to those results.

Let's see an example output of this step:

{
    "final_verdict": "reject",
    "confidence": 85,
    "key_reasons": [
        "Role lacks technical ownership and leadership responsibilities",
        "Candidate's seniority (manager) far exceeds job requirements",
        "Focus on code evaluation rather than system design/architecture",
        "Part-time hourly contract misaligned with candidate's experience level"
    ],
    "risks": [
        "Underutilization of candidate's leadership and technical expertise",
        "Potential for role to be perceived as junior-level despite candidate's seniority",
        "Mismatch between compensation structure (hourly) and candidate's career stage"
    ],
    "positive_signals": [
        "Remote work flexibility",
        "Opportunity to work with AI systems",
        "Python/C++ stack alignment"
    ],
    "summary": "While the technical stack aligns, the role's responsibilities and compensation structure are fundamentally misaligned with a senior manager's experience and career expectations. The position offers limited technical impact and leadership opportunities, making it unsuitable for someone with 25 years of experience in complex domains like medical devices and embedded systems."
}

Now it would be handy to render the results into some HTML page report. Just prepare some template and replace text with fields from our JSON data.