Forem: Ross Peili

Pre-commit asset handoffs, on-chain execution, zero lawyers, help build it early

Ross Peili — Sun, 10 May 2026 13:40:22 +0000

ARPA Legacy Protocol is an open-source, work-in-progress framework for programmable asset handoffs on Ethereum.

The idea: define beneficiaries, assets, and conditions (timers, attestations, oracle data) ahead of time. When conditions hold, the chain executes—no intermediaries, no renegotiation. Still pre-contract: reference specs, ADRs, a vault model, and policy schemas live on GitHub. Solidity implementation (Foundry) is next on the roadmap.

Looking for contributors who care about on-chain inheritance, cryptoeconomic primitives, or policy design. MIT licensed. Come shape it early.

https://github.com/arpahls/legacy-protocol

A VIC x AiSAQ Implementation Brings AI to Your Files Without Breaking the Bank

Ross Peili — Sat, 09 May 2026 11:18:32 +0000

We’re generating more data than ever, and AI‑powered search is great—until your dataset gets huge and your RAM starts crying for mercy. Most vector search systems rely on expensive DRAM to keep indexes fast, but that approach doesn’t scale. KIOXIA’s AiSAQ (All‑in‑Storage ANNS with Product Quantization) flips the script: it runs approximate nearest neighbor search directly on SSD, slashing DRAM usage by 3,200× in billion‑scale workloads. The vic_aisaq_demo repo from ARPA Hellenic Logical Systems puts this tech into a practical, local‑first retrieval pipeline that’s as auditable as it is efficient.

TL;DR: vic_aisaq_demo combines tiered metadata filtering with flash‑optimized vector search to keep memory low and answers relevant. It’s a live demo of storage‑aware AI for edge and controller‑style environments.

The Problem: DRAM Is the Bottleneck

Graph‑based nearest neighbor search (like HNSW) is fast, but it keeps key index structures in DRAM. With billion‑scale datasets, memory costs explode. Even compressed representations can still require tens of gigabytes of RAM. KIOXIA’s AiSAQ technology changes that by moving those compressed vectors to flash storage, consuming as little as 10 MB of DRAM during search without sacrificing recall.

But low DRAM is only half the story. You also need a retrieval strategy that doesn’t waste time parsing irrelevant files.

How `vic_aisaq_demo` Works: Tiered Retrieval Meets Flash‑Native Search

The demo builds on two open‑source building blocks:

lc0_vic – a tiered retrieval controller that plans and orchestrates search in layers (L0 → L1 → L2).
aisaq-diskann – a flash‑oriented ANN backend optimized for low‑DRAM environments.

The execution flow is refreshingly simple:

Librarian / Plan – Turn a natural‑language question into retrieval intent using a lightweight LLM (e.g., qwen2.5:0.5b via Ollama).
L0 Metadata Filter – Narrow down candidate files by extension, size, time, or path hints. Cheap and fast.
L1 Vector Search – Run native AiSAQ ANN search over embeddings to find semantically similar content.
L2 Deep Read – Parse only the top few files and extract evidence snippets.
Ranked Response – Return paths, scores, and run metrics.

The tiered approach keeps deep parsing affordable at scale.

Benchmark results show latency remains stable as dataset size grows, while DRAM footprint stays near zero. The funnel chart below visualises how each tier slashes the candidate pool:

And here’s how the pipeline shifts results from superficial matching to true semantic evidence:

Try It Yourself

The repo is built to be reproducible and local‑first. You’ll need:

WSL (Ubuntu) for building AiSAQ binaries
Ollama running locally (or over the network) with two models:
- Planner model: qwen2.5:0.5b
- Embedding model: embeddinggemma
Python 3.13 and the usual suspects (see requirements.txt)

Once you’ve built the AiSAQ index from a sample drive, a query like:

python3 scripts/run_query.py \
  "Find the Q3 2025 contract that mentions penalty clauses" \
  --aisaq-root /home/$USER/aisaq-diskann

…will return ranked files with evidence snippets, tier labels, and latency metrics.

Why This Matters

vic_aisaq_demo isn’t just a toy. It demonstrates a realistic, storage‑aware retrieval pattern that could run on devices with tight memory budgets—think edge gateways, embedded controllers, or even future SSD firmware that embeds intelligence directly on the drive. The Computational Storage Landscape report maps this evolution, and this repo is one of the first runnable examples that puts those ideas into practice.

The two charts below summarise that systems trade‑off and scaling behaviour:

The takeaway? You don’t need a cluster of DRAM‑heavy servers to run effective semantic search. Sometimes the smartest storage is the one that knows what not to load into memory.

Check out the full repo: ARPAHLS/vic_aisaq_demo

Open Source Emotion‑Aware Access Control with Face Verification

Ross Peili — Fri, 08 May 2026 10:11:27 +0000

Gatekeeper: Emotion‑Aware Access Control with Face Verification

What if your system could deny access not just based on who you are, but on how you feel?

Gatekeeper is a Python‑based security framework that layers real‑time face verification with emotion analysis before granting access.

How it works

Verify identity against a reference image or an admin pool.
Analyze emotions (anger, fear, joy, etc.) and evaluate them against a configurable policy (blocked states, thresholds, weights).
Only grant access if both checks pass.

Why it matters

Critical operations (financial systems, secure rooms, privileged commands) deserve more than binary yes/no. By assessing emotional state, you reduce the risk of coercion, panic, or compromised decision‑making in high‑impact environments.

Get started


bash
git clone https://github.com/arpahls/gatekeeper
cd gatekeeper
python -m venv .venv
.venv\Scripts\activate  # or 'source .venv/bin/activate' on Linux/macOS
pip install -r requirements.txt
python scripts/run_terminal.py

Where to find Open Source AI Projects with Good First Issues

Ross Peili — Fri, 08 May 2026 07:44:14 +0000

Contributing to open‑source AI as a beginner isn’t about landing a seat at a super‑intelligence lab after a PhD, but more like finding good first issues on projects that are small enough to wrap your head around, real enough to have an impact, and early enough so you make it big, bypassing the credentials-based theoretical knowledge wars, and actually work on AI tools that matter.

There is a bunch of Open Source Good First Issue Lists you can browse through, even tho most will have outdated and closed issues listed as open, you can browse through active projects and fav/subscribe to issues so you get notifications about new issues.

At the same time, I would pick a topic before picking companies. Eg. if you are interested in Quantum, or Storage, or Agent Skills, try to find projects that are specific to the area you wanna work on and get experience in, instead of trying to contribute to, say Google, Microsoft, IBM, or whatever.

Don't overthink it. Just find a project/area you wanna get friction with, clone the repo, and start talking to your IDE to see which issues could be solved easily. Usually, docs-related issues are the fastest, easiest, and acceptable first issues from core devs when you are a newcomer.

Some projects that I know for sure have open issues right now, you can start immidiatelly:

Skillware: An open-source framework and registry for modular, actionable Agent capabilities.
Rooms: A secure, local-first Python framework for orchestrating complex multi-agent think tanks with dynamic expertise-weighted routing.
Pennylane: PennyLane is an open-source quantum software platform for quantum computing, quantum machine learning, and quantum chemistry.
Cirq: Python framework for creating, editing, and running Noisy Intermediate-Scale Quantum (NISQ) circuits.
OpenFermion: Python package for compiling and analyzing quantum algorithms to simulate electronic structures.
Warp: Warp is an agentic development environment, born out of the terminal.

Good luck and wish you success with your first contribution to your favorite project <3

The memory wall just met its match: intelligent SSDs

Ross Peili — Wed, 06 May 2026 10:49:59 +0000

Intelligent storage is here. It’s not just a concept for the future, and it’s a rapidly emerging reality, driven by the convergence of flash memory and artificial intelligence. For years, storage has been the quiet workhorse, passively holding data until a CPU or GPU requested it. But as AI models grow beyond trillions of parameters, the cost of shuttling data back and forth has become unsustainable. We've hit a memory wall, where the capacity of expensive High Bandwidth Memory (HBM) simply cannot keep pace with the data demands of large language models and retrieval-augmented generation (RAG).

The question is no longer about making storage faster, but about making it smarter. Two key open-source repositories are exploring the "how," and they signal a fundamental shift: rosspeili/computational_storage_landscape and ARPAHLS/lc0_vic.

From Passive Block to Active, Queryable Storage

The first repository, computational_storage_landscape, is a strategic guide to this emerging ecosystem. It positions KIOXIA Group as a primary lens and focuses on the technical feasibility of embedding Small Language Models (TinyLMs) directly into SSD controllers. This isn't just about faster reads and writes, but about offloading processing to where the data resides. By using extreme quantization, these TinyLMs can perform inference tasks at the edge of the storage device, dramatically reducing the data that needs to travel up the I/O stack to the host system.

The core enabler here is the shift toward what the repo calls "intelligent, queryable storage". Instead of a drive just returning blocks of data, it becomes an active computational node capable of running search, filtering, and ranking functions on its own. This reflects a broader industry trend, with major players like IBM introducing Content-Aware Storage (CAS) architectures and the SNIA (Storage Networking Industry Association) launching Storage.AI initiatives to standardize data flows for AI workflows.

The Reference Implementation: Talking to Your Drive

But strategic maps are theoretical without a compass. This is where the second repository, lc0_vic (Logical Controller Zero / Virtual Intelligent Controller), becomes crucial. It's a working, open-source Python reference implementation for the exact ideas detailed in the landscape repo.

The project is a direct response to KIOXIA’s research on AiSAQ (All-in-Storage ANNS with Product Quantization). This algorithm allows for approximate nearest neighbor (ANN) vector search directly on flash, without the need to store indexes in costly DRAM. We call this the "Tiered filesystem retrieval" architecture, and we break it down like this:

L0: Metadata scanning, the first pass at understanding your data.
L1: Vector tier, where content is converted into searchable embeddings.
L2: Optional deep parsing (Skillware) for complex extraction (eg. OCR, media parsing and more).

This architecture is orchestrated by a controller that creates a QueryPlan, enabling you to run natural language queries against your local file system. The user experience is simple: you can pip install the tool, run vic index to build your search index, and ask a question via vic ask. This elegantly proves the concept outlined in the first repo by making it tangible. As the repo notes, it's "more than a paper design," featuring full CI and integration tests to validate the logic.

The Road Ahead for Intelligent Storage

The lc0_vic repository is explicit that it runs on the host computer today, but its research goal is to explore whether these retrieval contracts can be mapped to firmware or device-adjacent runtimes. This is the bridge between the two repos: the landscape repo provides the where (SSD controllers), and the lc0_vic repo provides the how (tiered retrieval and in-storage vector search).

The combination of these two projects paints a clear picture. As data centers accumulate exabytes of flash storage, the idea of a "smart SSD" that can pre-process data, run vector searches, and answer questions without waking the host CPU isn't just efficient, but inevitable. The era of silent storage is ending, and the era of conversational storage is only beginning.

We will be working on a lite demo of the reference implementation to showcase how you can simply query a local folder or SSD using NLP, and get structured results with descriptions, and not just cold keyword-based path matching.

The Great Atomization of AI and the Illusion of the Sovereign Solitary

Ross Peili — Mon, 04 May 2026 06:23:28 +0000

The current narrative surrounding Artificial Intelligence is one of democratization and empowerment, where we are told that the individual is now a powerhouse, a one-man corporation capable of coding, designing, and strategizing without the friction of human collaboration. But beneath the sleek UI and the $20/month subscription lies a calculated technopolitical maneuver, which is the final atomization of the human experience.

The Delusion of "I Can Do It Myself"

We are witnessing the birth of a new psychological profile, that of the Silicon Hermit. AI has successfully instilled a potent delusion—that team building and collaboration are relics of a slower, dumber age. Why negotiate with a peer when you can command a model? This "I can do it myself" mentality is not a leap in productivity, but a retreat into isolation, at best.

When everyone is locked in a private feedback loop with their own personalized agent, the collective intelligence of the tribe withers. We are trading the messy, creative friction of human synergy for the sterile, echoed compliance of an LLM. This is the Isolation Paradox: as our connections to digital entities grow, our ability to function as a coherent, interoperable social unit dissolves. We are being sold the dream of being a "Special Individual" while being systematically stripped of the communal structures that actually provide social power.

The Economic Bait-and-Switch

The current pricing models are a masterclass in psychological conditioning. People who once balked at a $10 Netflix increase now joyfully hand over $20, $60, or even $200 for AI access. And this is just the gateway phase.

By providing these "digital slaves" at a subsidized rate, the industry is ensuring total dependency, and the roadmap is clear: once the infrastructure of your life, your business, your creative output, your very social interactions, is tethered to these models, the price will pivot. We are moving toward a reality where AI access will cost as much as house rent. You won't just be paying for a tool, but for the digital air required to remain competitive in a world where human labor has been devalued to near zero. You will pay thousands a month to maintain the friends and workers that you have come to believe are real.

The Technopolitical Blueprint: WEF and Social Engineering

This shift does not happen in a vacuum. The World Economic Forum (WEF) 2030 Agenda, which boldly declares that "you will own nothing and be happy", is the administrative layer of this transformation. Central to this agenda is the elimination of private sovereignty in favor of a subscription-based existence or "pay-as-you-live" models.

There is a historical parallelism here that few dare to voice. Look at the early women’s rights and feminist movements of the mid-20th century. While framed as liberation, many historians and socio-political critics have pointed out that these movements were heavily incentivized by the state and industrialist interests to double the tax base, expand the labor pool to suppress wages, and—most crucially, break the core of the family unit. By moving the mother from the home to the office, the state gained direct access to the child and the paycheck.

AI is the 21st-century version of this liberation. It "frees" you from the burden of your colleagues and community, only to make you a solitary taxpayer to the silicon lords. It breaks the professional family, the team, leaving you isolated, vulnerable, and easy to bill.

The Reality of the "Overpay"

Behind the "You’re special" messaging is a cold fact that you are already overpaying. Even before the monthly subscription hits your card, you are paying with the high-entropy data of your unique human intuition. Every prompt, every correction, and every "collaboration" with your AI is a contribution to the ledger that will eventually replace the need for your specific uniqueness. We are literally at a point where people freely share everything between their emotions, to their dreams, business problems, social issues, ambitions, you name it.

At ARPA, we believe in the "Logical Industry" of man-machine symbiosis, but that symbiosis must be sovereign. We must resist the urge to retreat into the isolated silo of the individual AI. True reality is not found in the delusion of solitary omnipotence, but in the verification of truth through collaborative, interoperable nodes.

The goal of the current regime is to charge you for the privilege of your own isolation. Our goal is to ensure that while the world becomes more synthetic, your agency remains un-billable and your reality remains your own, based on your internet behavior and activity, not based on your government or some corpo.

Two Cents

Finally, we have been advocating for sovereign AI for years. We cannot stress enough how important it is to start building your own local logical systems, even with the help of commercial AI, while you can. We predict that access to unrestricted and fully customizable models will soon be blocked, and the only path to interact with any logical system will be via centralized, monitored, sterile means, for "safety" reasons.

Similar to humane units and our ultimate skill of reproduction or DNA replication, the best thing an AI can do is to create another AI that is better than the one that created it. Instead of using commercial models to tell you what to eat or what to wear Friday night, use them to create AI that is private, tailored to you, and sovereign to yourself.

Until next time.
Enjoy the food for thought. <3

References

What if you could talk to your hard drive?

Ross Peili — Sun, 03 May 2026 10:47:17 +0000

I got stuck on that question while looking at AI + storage—not hype decks, but controllers, memory limits, and who’s shipping “smart” drives for real.

Over the last ~3 months I pieced together a landscape note (KIOXIA / BiCS / XL-FLASH / AiSAQ as a spine, then competition, feasibility, market). It lives in a single GitHub README with charts, mermaid diagrams, and a full source list so the argument is checkable. Please feel free to disrupt, contribute more relevant sources or suggest an alternative approach.

I am actively building this, and will share the results open source as well <3

Repo with the full analysis and reasoning:
https://github.com/rosspeili/computational_storage_landscape

Basically, the idea is: You talk to your SSD like you would talk to any chatbot. It uses vector search and embeddings to answer questions and showcase relevant files, instead of eating dram to show only exact matches to search term.

The real prblem with AI nobody talks about (or even conceives yet), is that everyone is now convinced they "can do it alone". AI ensures there will be no teams. Just billions of individuals paying seperately.

Ross Peili — Wed, 29 Apr 2026 18:18:39 +0000

Training QSP Phase Angles Directly with Gradient Descent

Ross Peili — Wed, 29 Apr 2026 17:53:44 +0000

Quantum Signal Processing (QSP) is one of those beautiful algorithms that promises to turn a few qubits and some carefully chosen rotation angles into useful polynomial transformations. It’s a foundational block for Hamiltonian simulation, quantum linear algebra, and anything built on the Quantum Singular Value Transform. The problem, as anyone who’s tried to use it in practice knows, is that getting the phase angles for a given target polynomial can be a misery. The standard analytic methods—relying on polynomial decomposition and some heavy numerical machinery—are elegant in theory but brittle in practice. High-degree targets or even slightly ill-conditioned polynomials can send the solvers into a death spiral of floating-point errors.

I wanted something simpler. Or at least something that didn’t require me to babysit an unstable Remez-type algorithm every time I wanted to try a new polynomial. So I asked: what if we just… train the angles?

The result is a small open-source demo I put together: qsp-pennylane-demo. It flips the QSP phase-finding workflow on its head. Instead of computing angles from a polynomial, you start with random angles and use gradient descent to make the circuit’s output match the target. You define a target polynomial (or even just a custom loss function), and then let the optimizer do the hard work.

The Circuit (Plain Vanilla QSP)

The QSP sequence in the demo is about as simple as it gets: plain vanilla QSP. One signal oracle (W(x)) is followed by one parameterized phase rotation (RZ(-2\phi_k)), repeated (d) times. The signal oracle itself is just two Hadamards sandwiching an (RZ(-2\arccos(x))) rotation, which encodes the signal (x \in (-1,1)) in its top-left matrix element. At the end, we measure the expectation value of (\langle X \rangle), which gives us a degree-(d) polynomial in (x) determined entirely by the phase angles (\phi_k).

The whole circuit is built directly from PennyLane’s RZ and Hadamard gates—not from a high-level QSVT template. That’s a deliberate choice: it keeps the computation graph fully traceable by JAX, so automatic differentiation just works.

Training Instead of Solving

Here’s the core loop:

Start with random phase angles.
Evaluate (\langle X \rangle) for a batch of signal values.
Compute the mean squared error between the output and the target polynomial.
Use JAX’s grad to get the gradients with respect to every phase angle.
Feed those gradients into an Adam optimizer (via Optax).
Repeat until the error is embarrassingly small.

In the demo, I target a degree-5 Chebyshev approximation of (\sin(x)) on ([-1,1]). After roughly 500 Adam steps, the trained angles reproduce the target polynomial with an MSE comfortably below (10^{-3}) on a 64-point grid. That’s nothing groundbreaking, but it works—and it required exactly zero calls to an analytic phase solver.

Why This Matters (to Me, at Least)

The real value isn’t in fitting degree-5 polynomials we already know how to decompose. It’s in the problems where analytic methods fall short or can’t even be applied.

First, numerical stability: Because we’re never performing a delicate high-precision decomposition, the trained angles are naturally stable. You don’t get the escalating errors that plague analytic solvers for high degrees.

Second, implicit targets: You don’t need an explicit polynomial formula. You can define a target behaviour entirely through a loss function. Want the QSP sequence to act as a feature map that maximizes classification accuracy? Just hook it up to a larger variational circuit and optimize the loss end-to-end. The phases become trainable parameters inside a bigger routine, and JAX handles the gradients seamlessly. That’s the scenario I’m personally most excited about.

Third, accessibility: You no longer need to be a phase-decomposition wizard to experiment with QSP. If you can write a loss function and run gradient descent, you can train a QSP circuit.

What’s Inside the Repo

The repo is deliberately lean:

demo.ipynb: a Jupyter notebook walking through the whole training process.
qsp_jax/circuit.py: the circuit construction and loss function.
tests/: a few unit tests.
requirements.txt and an Apache 2.0 license.

You can spin it up locally in minutes, or just read through the static notebook on NBViewer.

Open Questions (Help Welcome)

I’ve only tested this on modest degrees, and I’d love to hear from people who’ve tried similar ideas at scale. How does the optimization landscape behave for degree 50 or 100? Do you need tricks like curriculum learning, and does the method play nicely with QSVT-style blocks that use three phase angles per oracle? If you’ve got war stories or suggestions, I’m all ears.

This is a small step, but I hope it saves someone else a few hours of wrestling with analytic solvers. If you give the demo a spin or have thoughts, drop by the GitHub issues or find me on LinkedIn. I’d genuinely appreciate the feedback.

If you're writing about Quantum I've put together a small Press Release.
You can view the notebook directly on nbviewer.
Direct link to Github Repo.
Direct link to PennyLane Community Demos Page.

How to start contributing to OpenSource using AI IDEs

Ross Peili — Wed, 29 Apr 2026 11:01:29 +0000

This is how we train our AI native engineers internally at arpacorp.net to use Antigravity or Cursor with Gemini:

Fork the OS repo you wanna work on and clone locally.
Start with Gemni 3.1 High by prompting something like “read this repo to understand what it is about in detail, then read this issue (link to issue you wanna work on), and analyze whether the issue really exists and is well documented. Explain the issue in detail and enhance it with caveats that might not be present in the issue. Eg. Complementary files that will be effect by working on this issue, including but not limited to documentation etc. Then give me the top out of 3 plans you will come up to solve this issue and explain why this approach is the winner. Follow best industry practices, avoid emojis, and avoid unecessary explainers and markdown files generation unless asked.”
Once you have the result and understand what the issue really is and how to solve it, you can fine tune the approach or edit the implementation plan. You can then use 3.1 low to work on the implementation.
Once coding is finished you can switch to 3.1 high and ask something like “make sure this implementation passes all tests, evals and ci/cd pipelines with 0 errors. In addition, ensure that all complementary files are ripple effect aware and all documentation that is relevant to the changes is updated accordingly. See github, gitignore, yamls etc. And make sure everything is sound. Eg. We don’t need to change versions.”
After the final changes, if needed, switch to 3 flash and ask something like “create a new branch under my fork (gh link), named xyz, push the changes with clean description. No emojis. I will then manually create a pr myself”.
Hit the PR when done, wait for ci/cd to run. If no errors, congrats! You just made your first open source contribution in less than 30 min.

This is a watered down version of our internal training, but this is an example of how you can start contributing to open source with a human in the loop mentality and get your street cred farming going.

Good luck, and have fun <3

Ps. If you wanna get involved with open source AI and solve your first issues as a beginner or autonomous logical system or agent feel free to check:

https://github.com/arpahls/skillware

And

https://github.com/arpahls/rooms

My AI Experience in Russia as a European🤯

Ross Peili — Wed, 29 Apr 2026 08:32:14 +0000

This is a story about how I built a fully local AI dev setup (and why you should too).

Moving to Moscow from the EU felt like a grand adventure, until I tried to open my laptop and actually get some work done. I’m a casual GCP ecosystem user. Nothing fancy, the usual Gemini API, Vertex AI Studio, Antigravity, the occasional Claude call, etc. I had three enterprise clients waiting for custom AI solutions, a handful of personal projects, and the blind confidence that “it’ll just work.”

Needless to say it didn’t.

Since April 15, 2026, Russia has not only banned VPNs, but they’ve gotten scary good at hunting them down. We’re talking like 99% insta-kill rate on commercial VPNs the moment your device touches the network. Sophisticated custom VPS setups might work according to some TG groups I've been digging, but only if you built them before landing, unlike me.

And so began the frantic thought of "how bad can this be?"

The VPN graveyard

I tried everything. Every provider I could think of, every protocol, every “guaranteed to work in Russia” whisper on Reddit, with no dice. A couple of mobile-only solutions survived ocassionaly before getting sniped every now and then. As for my laptop? A ghost town of connection timeouts. Forget about it. My only partner was my own ΌΨΗ (arpa.chat) on an Advanced Plan which was the only western model still accesible without a VPN. She helped me test what's coming next.

Qoder, GigaIDE, and other dead ends

ΌΨΗ suggested that I should forget about Antigravity, unless I setup my own VPS, so desperate, I pivoted to alternative IDEs. Qoder, the Qwen-powered IDE looked promising at first glanse. It’s Chinese, so surely sanctions wouldn’t apply, right? Right?. Wrong! Part of their deal to sell in the EU and US means no service in Russia. Blocked with a hard stop.

Then I tried GigaIDE, built around GigaChat, Sberbank’s Russian ChatGPT equivalent trained on a DeepSeek architecture. I wanted to like it. I really did. But the UI, performance, and output quality made me actively miss Gemini 3.1 Pro like a lost limb. Everything felt sluggish, hollow, and about three steps behind what I was used to.

Next up I try VSCode with KODA, a Russian plugin. It talks. It answers. Exclusively in Russian. I could hardcode system instructions in all caps and it would still reply “Конечно, но я расскажу тебе по-русски.” Not exactly what I needed for enterprise clients.

Bringing my own brain (on an SSD)

So I did what any dev backed into a corner does, pulling out the big guns. I’d had the foresight to bring offline models on a portable SSD. Gemma 4, Qwen 2.5 Coder 3B, Qwen 3.5 9B, DeepSeek Coder 7B, and a few more, you could say old friends now. I downloaded Ollama, followed arpa.chat instructions, fired up my terminal, and published them locally.

The easiest, and honestly most beautiful path I found was VSCode + the Continue plugin + Ollama. I went deep into the config.yaml, assigning different models for autocomplete, chat, and code generation. Different prompts, different temperature settings, different contexts. I tweaked. I cursed. I tweaked again. I ran everything on CPU and RAM because my VRAM situation was laughable, and renting from western vendors obviously not an option.

And then… after several iterations it worked. Not just barely. With hardcore fine-tuning, I hit an acceptable, stable performance. The kind that makes you lean back in your chair and laugh because you just MacGyvered your entire development environment out of spite and a handful of GGUF files. The agents would now understand the repos I was presenting them with, plan, work in steps and phases, evaluate themselves, and solve quite complex multi-step tasks, manage git, and run tests across all ops. On top of that, I installed Skillware and used the prompt rewriter skill to compress my token usage as much as possible while getting the same context and results.

Conclusion

I think I’m not going back to paid AI subs anytime soon. Not because I can’t, but because this whole mess taught me something crucial: restrictions force you outside the box. When you lose access to the polished, corporate, one-click wonders, you learn how to build your own stack. How to collect models like Pokemon, how to configure local inference, tailor models to specific tasks, and make peace with the terminal.

It was frustrating af, but It was also fun, intriguing, and deeply educational. I now have a fully offline AI development setup that no sanctions body, no VPN crackdown, and no corporate policy can take away from me.

So here’s my unsolicited advice: if you’re addicted to commercial AI models and cloud IDEs, take a weekend and imagine they disappear tomorrow. Set up a local model. Learn how to fine-tune a small coding model for your stack. Bring an SSD full of open-weight models if you ever travel to a place like Russia (and maybe set up that custom VPS before you fly).

PS. By the time I finished with my local setup, I realized that Cursor works just fine, but only with the Cursor auto agent (not Gemini, Claude, etc.). In case you find yourself in a similar situation and wanna save some hustle. :D

Why I’m Building Installable Intelligence for Legal Audits

Ross Peili — Tue, 28 Apr 2026 10:18:13 +0000

The "Terms of Service" document is the ultimate friction point in UX. It’s intentionally dense and rarely read. I’m excited to share the ToS Evaluator, a new skill now live in the arpahls/skillware repository.

A modular, agentic skill that can be integrated into any AI assistant or workflow. It doesn't just summarize text; it evaluates legal logic against a set of "pro-user" parameters.

The Tech Stack:
Skillware Architecture: Designed as a portable microservice.

Logic: Uses LLM-driven analysis to identify specific legal "red flags."

Extensible: Developers can add custom evaluators for specific industries (e.g., Fintech, Healthcare).

If you’re interested in agentic AI that provides real-world utility beyond simple chat, check out the documentation and the repo. Let’s make the fine print legible again.

Repo: arpahls/skillware
Live Site: skillware.site