Forem: Michał

AI Is Not Magic - It Is Just Another System To Integrate

Michał — Thu, 23 Apr 2026 06:10:22 +0000

Reading more and more news and articles about AI, I keep running into the same myth: that AI is some kind of superpower, a powerful intelligence that will soon replace people in office work and intellectual work. Sometimes this myth is stated directly, and sometimes it only appears between the lines.

Because of that, it triggers a wide range of social emotions and reactions. Some people see AI as a threat, others are excited about it and want to push it further, and some simply pretend it is not happening at all.

I get the impression that this narrative of "all-powerful AI" is driven mainly by companies selling AI systems and by influencers who are impressed by some of their capabilities. Much less often do I read about limitations, implementation costs, or quality in real business environments. At this point, that feels like a dangerous trend to me, even if it is probably partly inevitable.

Most AI tools I come across are still generic systems. There are often entire companies and massive budgets behind them. These tools can absolutely be helpful, but the biggest gains usually appear where the process is dynamic, where teams can adapt quickly, where the rules of work can be changed easily, and where certain limitations are acceptable. In other words: small companies, startups, freelancers, or smaller projects where the organization's own context is still relatively small.

In large organizations, generic tools rarely create radical change on their own. Not because they are useless, but because they have important limitations. Once they collide with the scale of corporate reality - tons of documents, complex processes, internal rules, legal constraints, security requirements, domain knowledge - their effectiveness drops significantly.

I can see this in my own work. When I build things as a hobby, I usually work on small projects, and with AI I can move at a speed that would have felt impossible before. But when I switch to work inside a very large organization and use the same tools, the productivity gain is much smaller. Too much context kills generic tools. The same system starts relying more on general knowledge from training instead of the reality of a specific company, and that is where its suggestions start becoming much less reliable.

On top of that, the real bottleneck is often not human productivity itself. More often, it is the corporate process, domain knowledge, onboarding into the task, responsibility for decisions, and the need for verification. Even if AI completes part of the work in thirty minutes, someone still has to review it, understand the context, and take responsibility for the outcome.

That is why I believe the real value in large companies does not come from "AI itself" but from custom context delivery systems, orchestration layers, integration with internal processes, and specialization in specific domains. If a tool is supposed to work in a professional and reliable way, it cannot stay purely generic. It has to understand the domain, the organization's constraints, and the way decisions are actually made.

And this leads to the point that I think is often ignored: adopting AI in a company is not just about buying a tool. It means building more software systems, more integrations, and more mechanisms for managing context. In other words, what we get is not a ready-made replacement for people, but another part of the architecture that has to be designed, implemented, and improved over time.

From Writing Code to Writing Prompts

Michał — Tue, 10 Mar 2026 13:32:24 +0000

Not long ago, most of my working day was spent writing code.

Today, a large part of it is spent writing… prompts.

Ever since AI appeared in programming, I’ve been a big supporter of it. At first, it was mostly simple features like autocomplete, suggesting more or less accurately what I needed. Then came generating unit tests, writing simple functions through an agent, and similar things.

On a daily basis I work with a solution that contains hundreds of projects, each with hundreds of source files. Some of those files are hardcore legacy code, often thousands of lines long. Because of that, I started using AI for a new kind of task — describing what a given class or project actually does, or locating places in the code that might interest me so I can inject my own implementation.

Today I catch myself mainly writing prompts rather than code. I review the output produced by AI, and if I modify it, it’s usually just a few lines. The actual implementation mostly comes down to writing the prompt correctly, which an agent can then implement much faster than I could myself.

My role is to review the results, test the changes manually, and sometimes write one or two additional prompts that slightly adjust the solution. I rarely need to write code if I have good instructions and I know what I want.

When it comes to code review, it often means reading a report generated by AI, approving a few remarks that are actually relevant to the project, and rejecting the rest. Then I jump on a call with a colleague and we discuss some of the details.

AI has had a significant impact on the way I work. I now spend much more time thinking about how something should work before implementing it. More time on conceptual work. Of course, part of it might simply be professional maturity that comes with experience. In the past I would just start writing code and modify it along the way. Today it’s quite satisfying to delegate the implementation itself to artificial intelligence and get a result within a few or a dozen minutes.

The transition from one way of working to another was completely smooth. It wasn’t an “aha” moment or an overnight switch. Over time I simply started using it more and more, building new tools that automate parts of my workflow. At some point I realized that I had moved from being a Senior .NET Developer to something closer to a Senior Prompt Engineer :D

A few things definitely contributed to that shift. MCP certainly played a big role. But beyond the different factors that make AI genuinely useful and capable of doing real work, I see a big difference in the way prompts themselves are written.

The foundation of the AI world is the prompt.

At the beginning I talked to AI almost like I would talk to a colleague at work, giving it plenty of room to guess and interpret what I meant. Today, every prompt that instructs AI to generate code is created with the help of a dedicated agent designed specifically for that purpose. In its own way, it keeps the more “creative” tendencies or hallucinations in check.

With this approach, GitHub Copilot has a much harder time drifting away from the path I want it to follow. It stopped adding code in classes I have no intention of modifying, it gets lost less often, and it’s much easier for me to land on a good solution almost immediately.

What still feels a bit annoying is the manual flow. It usually means talking to one agent that generates a prompt, then copy-pasting it into another one, and so on. Often the first agent needs a fairly extensive description of the task’s context — something that theoretically AI could just read on its own.

My goal is to connect everything into a single flow: writing everything in the Copilot window inside Visual Studio, where it would be instructed to transform my input into a well-structured, professional prompt that I would only need to approve.

Maybe after working like this for some time I’ll eventually reach a point where I can trust the implementation plan proposed by AI from the very beginning.

The difference between prompts I write myself and those created by an agent is quite significant. The agent often adds various warnings and instructions: what the coding agent should not do and what it should do. It lays everything out clearly, step by step.

For me, most of these things usually seemed obvious. I simply don’t have the habit of writing instructions that precisely. Because of that, I often ended up going down the wrong path — AI would follow a direction different from the one I intended.

Today, what I bring into this workflow is direction. Decision-making. Connecting solutions within a broader context. I need to know what I want. And in the end, validation of results and evaluation of what actually makes sense.

Statistically, I usually close an implementation within a few iterations. Sometimes fewer, sometimes more. The more precisely I describe the task, the better the results. If I approach the planning carefully, or if the task itself is relatively simple, the implementation often finishes in a single iteration and I get exactly what I wanted.

Now I’m going to say something that many people might disagree with — especially those who are strongly attached to the role of the expert.

I have the impression that remembering various technical nuances is becoming less important. Copilot already has that knowledge built in.

More and more I find myself thinking at a higher level — how to connect things, which algorithm or design pattern will solve a problem — rather than how exactly to implement something or which method to call.

Even though this trend clearly moves toward the total automation of a developer’s work, for now I’m not particularly worried about AI replacing me.

I use it a lot, and I see how much effort still goes into guiding it properly and how precisely I need to explain the expected outcome.

Without that — at least for now — AI struggles when it comes to working in large organizations and massive codebases.

Maybe at some point in the future I’ll revise that opinion.

GitHub and SeaGOAT: A Quick Guide to Code Search Solutions

Michał — Thu, 26 Dec 2024 12:02:57 +0000

As a developer fascinated by artificial intelligence, I have embarked on a new journey to deepen my knowledge of NLP and the problem of code search. After a preliminary analysis of CodeBERT solutions, I remain skeptical as to whether a single AI model can generate embeddings that work equally well for both programming languages (PL) and natural languages (NL). These two domains differ vastly in semantics, with each following its own rules, deeply tied to their unique use cases.

In this post, I want to share some of my thoughts and findings, particularly regarding open-source tools like SeaGOAT and GitHub’s approach to code search as described in their blog and here.

A Look at SeaGOAT: Combining Simplicity with Functionality

SeaGOAT is an open-source tool written in Python that employs two “engines” for code search. The first is ripgrep, a traditional text-searching tool. In essence, it works by breaking a user’s query into individual words and then retrieving every line from the repository containing at least one of those words. The simplicity here is notable: it relies on the assumption that, for example, a function handling map rendering will likely include the word "map," which a user might also use in their search.

The second mechanism is chromadb, a database designed to store embeddings. SeaGOAT uses the all-MiniLM-L6-v2 model to generate these embeddings, which is the default model used by chromadb. While this model performs well in generating vector representations, it wasn’t trained on code, and therein lies the problem. It faces the same semantic challenges I mentioned earlier: trying to generate consistent vector embeddings for both natural and programming languages. Because of this, I chose to skip further tests with SeaGOAT and instead turn my attention to GitHub’s approach.

GitHub’s Solution: A Two-Model System

GitHub is a huge company that operates commercially, and the problem of code search is one of the most important challenges they have had to face. Initially, GitHub’s search relied on keyword matching—a straightforward approach similar to ripgrep. But their latest innovations present a much more nuanced solution.

Their system uses two AI models in tandem:

The Documentation Model: This model is trained on the task of generating documentation for code. It takes programming language (PL) as input and maps it into an embedding space tied to natural language (NL).
The Search Query Model: This model is tuned to the same embedding space but works in the opposite direction. It takes natural language (NL) queries as input and generates embeddings in the same vector space as the documentation model.

The brilliance of this system lies in its duality. Both models process entirely different types of input, yet their outputs exist within the same semantic vector space. This allows for meaningful matches between user queries and code fragments, despite the inherent differences in the languages being processed.

This approach, in my opinion, feels far more intuitive and semantically accurate than other solutions I’ve encountered. By allowing each model to specialize in its domain while sharing a unified embedding space, GitHub has created a system that respects the nuances of both natural and programming languages.

Closing Thoughts

The more I explore, the more I realize the depth of the code search problem. While tools like SeaGOAT offer valuable insights, the sophistication of GitHub’s solution sets a high bar for others. Their two-model approach, bridging the gap between PL and NL, seems like a step in the right direction.

As I continue my exploration, I’m eager to delve deeper into these dual-model architectures and understand how they might be adapted or extended for even more effective code search solutions.

For now, this journey remains ongoing, and I’m grateful for the learning opportunities it provides. If you’ve worked on similar problems or have insights to share, I’d love to hear your thoughts in the comments below.

Exploring GraphCodeBERT for Code Search: Insights and Limitations

Michał — Sat, 21 Dec 2024 13:47:38 +0000

As a professional developer working daily with a massive codebase containing millions of lines of code and over 1,000 C# projects, finding the right pieces of code to modify can often be a time-consuming task. Recently, my interest has revolved around solving the problem of code search, and I was particularly intrigued by the potential of GraphCodeBERT, as outlined in the research paper GraphCodeBERT: Pre-training Code Representations with Data Flow.

Encouraged by the promising results described in the paper, I decided to evaluate its capabilities. The pretrained model is available here, with a corresponding demo project hosted in the GitHub repository: GraphCodeBERT Demo.

Diving Into Code Search

Initially, I went all in and vectorized the SeaGOAT repository, resulting in 193 Python function records stored in my Elasticsearch database. Using natural language queries, I attempted to find relevant functions by comparing their embeddings via cosine similarity. Unfortunately, I noticed that similar results were returned across multiple, distinct queries.

This led me to believe that the model likely requires fine-tuning for better performance. To test this hypothesis, I decided to take a simpler approach and use the demo project provided with the pretrained model.

Testing with a Controlled Dataset

The demo focuses on three Python functions:

1) download_and_save_image

def f(image_url, output_dir):
    import requests
    r = requests.get(image_url)
    with open(output_dir, 'wb') as f:
        f.write(r.content)

2) save_image_to_file

def f(image, output_dir):
    with open(output_dir, 'wb') as f:
        f.write(image)

3) fetch_image

def f(image_url, output_dir):
    import requests
    r = requests.get(image_url)
    return

Modified Query Results

Below is the table reflecting my findings when testing slightly modified queries against the three functions. It represents the similarity between the user query vectors and the function vectors.

User Query	1	2	3
Download an image and save the content in output_dir	0.97	9.7e-05	0.03
Download and save an image	0.56	0.0002	0.44
Retrieve and store an image	0.004	7e-06	0.996
Get a photo and save it	0.0001	4e-08	0.999
Save a file from URL	0.975	6e-07	0.025
Process downloaded data and reshape it	0.025	0.0002	0.975
Go to the moon and back as soon as possible	0.642	0.006	0.353

Observations

From the table, it’s evident that the model correctly identifies the function only when the query is very specific and closely matches the original wording. When queries are slightly modified or synonyms are used, the results seem almost random. The same issue occurs with abstract queries or those unrelated to any function in the database.
It’s also evident that for 2 of the functions, every query returns very low similarity scores, which seems suspicious. This raises questions about whether the model is properly capturing meaningful distinctions for these cases or if there's an issue with the embeddings or similarity calculations.

Concluding Thoughts

After experimenting with the demo version, I concluded that further exploration of this model for code search in larger repositories may not be worthwhile—at least not in its current form. It appears that code search based on natural language queries cannot yet be solved by a single AI model. Instead, a hybrid solution might be more effective, grouping classes or functions based on logical and business-related criteria and then searching these groups for code that addresses the specified problem.

I plan to continue exploring this area further. If you have any insights, suggestions, or experiences with code search models or techniques, please don’t hesitate to share them in the comments. Let’s discuss and learn together!

Exploring Code Search with CodeBERT – First Impressions

Michał — Wed, 18 Dec 2024 22:38:49 +0000

Recently, I’ve been exploring AI models that aim to solve the code search problem, and I came across CodeBERT from Microsoft. The repository can be found here: https://github.com/microsoft/CodeBERT/tree/master.

The project approaches the code search task in two ways, but today I want to focus on the first approach I looked into: using the basic CodeBERT model.

In the paper "CodeBERT: A Pre-Trained Model for Programming and Natural Languages," the authors highlight their achievements, claiming state-of-the-art results for code search tasks. Naturally, I was curious to see how it works.

The approach is based on binary classification:

The model takes two inputs: a natural language query as the first part of the vector and a code snippet as the second.

It outputs either 0 (no match) or 1 (match).

For this to work in a code search software:

The code needs to be split into smaller fragments, such as functions or methods.
A user provides a query describing the function they’re looking for.
The algorithm iterates through all code fragments, combining the query with each fragment to create input vectors.
These vectors are passed through the model, which determines whether the query matches a particular fragment.

The output is a list of code fragments that align with the user’s query.

While this approach works conceptually, it’s not particularly efficient for larger repositories, nor practical for real-world problems. Iterating over each fragment and classifying it one by one can be time-consuming and impractical at scale. It might be a helpful solution for smaller projects, but I don’t see much value in implementing a code search engine for small repositories where traditional search methods often suffice.
I wonder if there are more advanced methods out there.

Next, I plan to take a closer look at GraphCodeBERT, hoping it might offer a different perspective on the problem.

I’d love to hear from you:

Are there any tools or models you’ve used for code search that integrate well into real-world workflows?

Are there solutions you’ve been curious to explore but haven’t had the time to test yet?

Any suggestions or experiences you’re willing to share would be greatly appreciated.