<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: TildAlice</title>
    <description>The latest articles on Forem by TildAlice (@tildalice).</description>
    <link>https://forem.com/tildalice</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3755725%2Fed8d5042-b5bb-495f-b8f6-9d8b470e1d46.png</url>
      <title>Forem: TildAlice</title>
      <link>https://forem.com/tildalice</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tildalice"/>
    <language>en</language>
    <item>
      <title>LSTM vs Transformer: S&amp;P 500 1-Year Benchmark Results</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Wed, 15 Apr 2026 21:04:17 +0000</pubDate>
      <link>https://forem.com/tildalice/lstm-vs-transformer-sp-500-1-year-benchmark-results-1ni</link>
      <guid>https://forem.com/tildalice/lstm-vs-transformer-sp-500-1-year-benchmark-results-1ni</guid>
      <description>&lt;h2&gt;
  
  
  Transformers Beat LSTMs 62% of the Time — But Not How You'd Expect
&lt;/h2&gt;

&lt;p&gt;Ran both architectures on 252 trading days of S&amp;amp;P 500 data. Transformer won directional accuracy 62% vs LSTM's 58%. But here's the catch: LSTM's losses were smaller. When Transformer was wrong, it was &lt;em&gt;really&lt;/em&gt; wrong — average error 2.3% vs LSTM's 1.1% on missed days.&lt;/p&gt;

&lt;p&gt;This isn't another "Transformers are the future" post. It's a side-by-side implementation where I tracked every metric that matters for actual trading: directional accuracy, mean absolute error, Sharpe ratio of a simulated strategy, and training time. The results don't fit the narrative you'd expect from reading ML Twitter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-lstm-vs-transformer-stock-prediction-sp500-benchmark-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-lstm-vs-transformer-stock-prediction-sp500-benchmark-1.jpg" alt="Close-up of vibrant stock market graphs displaying trading trends on a monitor, ideal for finance and cryptocurrency concepts." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@alphatradezone" rel="nofollow noopener noreferrer"&gt;AlphaTradeZone&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  The Setup: Same Data, Fair Fight
&lt;/h2&gt;

&lt;p&gt;Pulled SPY daily data from 2023-01-03 to 2024-12-31 using yfinance. 252 trading days, split 80/20 train/test. Features: close price, volume, 5-day MA, 20-day MA, RSI(14), normalized to [0,1] with MinMaxScaler.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/lstm-vs-transformer-stock-prediction-sp500-benchmark/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>lstm</category>
      <category>transformer</category>
      <category>stockprediction</category>
      <category>timeseries</category>
    </item>
    <item>
      <title>Bubble Sort to Timsort: Why Python Ditched O(n )</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:04:04 +0000</pubDate>
      <link>https://forem.com/tildalice/bubble-sort-to-timsort-why-python-ditched-on2-o5j</link>
      <guid>https://forem.com/tildalice/bubble-sort-to-timsort-why-python-ditched-on2-o5j</guid>
      <description>&lt;h2&gt;
  
  
  Python's sort() is too fast to be simple
&lt;/h2&gt;

&lt;p&gt;Run &lt;code&gt;sorted([3,1,2])&lt;/code&gt; in Python and you get &lt;code&gt;[1,2,3]&lt;/code&gt; in microseconds. Under the hood, you're not running the bubble sort from your algorithms textbook—you're running &lt;strong&gt;Timsort&lt;/strong&gt;, a hybrid algorithm that combines merge sort and insertion sort, exploits real-world data patterns, and routinely beats $O(n \log n)$ worst-case guarantees in practice. The gap between "sorting 101" and production code is enormous, and most people never see why.&lt;/p&gt;

&lt;p&gt;I'm going to show you exactly what happens when you replace the beginner-friendly algorithms with what Python actually uses, and why the performance difference matters even on small datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  What bubble sort actually costs
&lt;/h2&gt;

&lt;p&gt;Bubble sort is the first algorithm most of us learn. The idea: repeatedly step through the list, compare adjacent elements, swap if they're out of order. After each pass, the largest unsorted element "bubbles" to its correct position.&lt;/p&gt;

&lt;p&gt;Here's the canonical implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bubble_sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;swapped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;swapped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;swapped&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Already sorted
&lt;/span&gt;            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/bubble-sort-timsort-why-python-ditched-basics/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>sortingalgorithms</category>
      <category>timsort</category>
      <category>python</category>
      <category>algorithmcomplexity</category>
    </item>
    <item>
      <title>RAG vs Fine-Tuning vs Hybrid: Cost-Performance for 3 Use Cases</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Tue, 14 Apr 2026 18:04:09 +0000</pubDate>
      <link>https://forem.com/tildalice/rag-vs-fine-tuning-vs-hybrid-cost-performance-for-3-use-cases-5gfm</link>
      <guid>https://forem.com/tildalice/rag-vs-fine-tuning-vs-hybrid-cost-performance-for-3-use-cases-5gfm</guid>
      <description>&lt;h2&gt;
  
  
  The $47/day Question That Changed My Approach
&lt;/h2&gt;

&lt;p&gt;Our customer support chatbot was burning through $47/day in OpenAI API calls. The obvious fix? Fine-tune a smaller model. Six weeks later, we'd spent $2,100 on fine-tuning experiments and the bot was &lt;em&gt;worse&lt;/em&gt; at handling edge cases.&lt;/p&gt;

&lt;p&gt;This isn't a story about fine-tuning being bad. It's about when each approach actually pays off — with real numbers from three production systems I've worked on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-rag-vs-fine-tuning-cost-performance-matrix-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-rag-vs-fine-tuning-cost-performance-matrix-1.jpg" alt="Close-up of a mechanic working on a car engine in a garage setting, focusing on air filter adjustment." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@matreding" rel="nofollow noopener noreferrer"&gt;Mathias Reding&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  The Core Trade-off Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Most comparisons focus on accuracy vs cost. That's the wrong framing.&lt;/p&gt;

&lt;p&gt;The real question is: &lt;strong&gt;how often does your knowledge change?&lt;/strong&gt; A legal document assistant dealing with case law from 2020 has different needs than a product FAQ bot where marketing updates the copy weekly.&lt;/p&gt;

&lt;p&gt;RAG excels when knowledge is dynamic. Fine-tuning wins when behavior patterns matter more than factual recall. Hybrid approaches — and this surprised me — often cost more than pure RAG while delivering marginal gains.&lt;/p&gt;

&lt;p&gt;Let me show you the numbers.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/rag-vs-fine-tuning-cost-performance-matrix/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>finetuning</category>
      <category>llm</category>
      <category>costoptimization</category>
    </item>
    <item>
      <title>NumPy Vectorization Cuts Cointegration Test Time by 8x</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Tue, 14 Apr 2026 15:04:59 +0000</pubDate>
      <link>https://forem.com/tildalice/numpy-vectorization-cuts-cointegration-test-time-by-8x-4bec</link>
      <guid>https://forem.com/tildalice/numpy-vectorization-cuts-cointegration-test-time-by-8x-4bec</guid>
      <description>&lt;h2&gt;
  
  
  The Loop That Took 47 Seconds
&lt;/h2&gt;

&lt;p&gt;Running cointegration tests across 500 stock pairs shouldn't take 47 seconds. But there I was, staring at a progress bar that moved like it was stuck in molasses. The bottleneck? A nested Python loop computing the Engle-Granger test for every pair in my watchlist.&lt;/p&gt;

&lt;p&gt;The fix took the runtime from 47 seconds to 5.8 seconds. No fancy libraries, no Cython, no multiprocessing — just NumPy vectorization done properly.&lt;/p&gt;

&lt;p&gt;Here's what the slow version looked like. This is representative of code I've seen in dozens of pairs trading implementations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;statsmodels.tsa.stattools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;coint&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;slow_cointegration_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_matrix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;price_matrix: shape (n_days, n_assets)&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;n_assets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;pvalues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;n_assets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_assets&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_assets&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_assets&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# statsmodels coint returns (t-stat, pvalue, crit_values)
&lt;/span&gt;            &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;coint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_matrix&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;price_matrix&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;pvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt;
            &lt;span class="n"&gt;pvalues&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pvalues&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For 500 assets, that's 124,750 pairs. Each &lt;code&gt;coint()&lt;/code&gt; call runs an OLS regression, computes residuals, then performs an ADF test on those residuals. The Python interpreter overhead on 124,750 iterations adds up fast.&lt;/p&gt;






&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/numpy-vectorization-cointegration-test-8x-speedup/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>numpy</category>
      <category>cointegration</category>
      <category>pairstrading</category>
      <category>vectorization</category>
    </item>
    <item>
      <title>Claude Code CLI vs API: Real Cost at 50K Lines/Month</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Mon, 13 Apr 2026 21:04:36 +0000</pubDate>
      <link>https://forem.com/tildalice/claude-code-cli-vs-api-real-cost-at-50k-linesmonth-4m1n</link>
      <guid>https://forem.com/tildalice/claude-code-cli-vs-api-real-cost-at-50k-linesmonth-4m1n</guid>
      <description>&lt;h2&gt;
  
  
  The $127 Mistake I Almost Made
&lt;/h2&gt;

&lt;p&gt;I burned through my Claude API quota in 11 days. The bill: $127 for what I thought was "casual usage" — code reviews, refactoring sessions, documentation generation. The next day I switched to Claude Code CLI with a Max plan subscription. Same workload, same outputs, zero usage anxiety.&lt;/p&gt;

&lt;p&gt;But here's what nobody tells you: the CLI isn't always cheaper. If you're running batch jobs or CI pipelines, the API can actually save money. The pricing models are asymmetric enough that the "obvious" choice depends entirely on your usage pattern.&lt;/p&gt;

&lt;p&gt;I rebuilt my entire workflow around this question: when does a $20/month Max subscription beat pay-per-token pricing? The answer involves some counterintuitive math and a few surprising edge cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-claude-code-cli-vs-api-cost-comparison-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-claude-code-cli-vs-api-cost-comparison-1.jpg" alt="Close-up of AI-assisted coding with menu options for debugging and problem-solving." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@dkomov" rel="nofollow noopener noreferrer"&gt;Daniil Komov&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  API Pricing: The $15/MTok Trap
&lt;/h2&gt;

&lt;p&gt;Claude's API pricing as of April 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Opus 4.5&lt;/strong&gt;: $15 input / $75 output per million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet 4.5&lt;/strong&gt;: $3 input / $15 output per million tokens
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Haiku 3.5&lt;/strong&gt;: $0.25 input / $1.25 output per million tokens&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/claude-code-cli-vs-api-cost-comparison/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudeai</category>
      <category>apipricing</category>
      <category>developertools</category>
      <category>costoptimization</category>
    </item>
    <item>
      <title>Cloud GPU Cost Showdown: ViT Training on AWS vs GCP vs Azure</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Mon, 13 Apr 2026 18:04:51 +0000</pubDate>
      <link>https://forem.com/tildalice/cloud-gpu-cost-showdown-vit-training-on-aws-vs-gcp-vs-azure-2d4a</link>
      <guid>https://forem.com/tildalice/cloud-gpu-cost-showdown-vit-training-on-aws-vs-gcp-vs-azure-2d4a</guid>
      <description>&lt;h2&gt;
  
  
  I Spent $347 Training the Same ViT Model Three Times
&lt;/h2&gt;

&lt;p&gt;Same model (ViT-B/16). Same dataset (ImageNet-1k). Same batch size and optimizer. Three different cloud providers. The final cost difference? 2.8x between the cheapest and most expensive option.&lt;/p&gt;

&lt;p&gt;This isn't a theoretical comparison. I trained the exact same Vision Transformer on AWS, GCP, and Azure to see where your money actually goes. The results were surprising — not just in total cost, but in where the hidden charges showed up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-cloud-gpu-cost-vit-training-aws-gcp-azure-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-cloud-gpu-cost-vit-training-aws-gcp-azure-1.jpg" alt="Three NVIDIA GeForce RTX graphics cards stacked on a surface, showcasing their sleek design and branding details." width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@zeleboba" rel="nofollow noopener noreferrer"&gt;Andrey Matveev&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  The Setup: ViT-B/16 on ImageNet-1k
&lt;/h2&gt;

&lt;p&gt;Vision Transformer Base with 16x16 patches. 86M parameters. Training from scratch on ImageNet-1k (1.28M images, 1000 classes) for 90 epochs using the standard recipe from Dosovitskiy et al. (2021).&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/cloud-gpu-cost-vit-training-aws-gcp-azure/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>vit</category>
      <category>cloudgpu</category>
      <category>aws</category>
    </item>
    <item>
      <title>PPO vs A2C: CartPole Training Speed &amp; Sample Efficiency</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Mon, 13 Apr 2026 15:04:38 +0000</pubDate>
      <link>https://forem.com/tildalice/ppo-vs-a2c-cartpole-training-speed-sample-efficiency-4dm2</link>
      <guid>https://forem.com/tildalice/ppo-vs-a2c-cartpole-training-speed-sample-efficiency-4dm2</guid>
      <description>&lt;h2&gt;
  
  
  Why A2C Often Trains Faster Than PPO (Until It Doesn't)
&lt;/h2&gt;

&lt;p&gt;Most RL tutorials pick PPO as the default on-policy algorithm without questioning it. The narrative goes: PPO is stable, sample-efficient, and industry-proven. But when you benchmark it against A2C on CartPole-v1, something weird happens — A2C hits the 500-reward threshold in half the timesteps.&lt;/p&gt;

&lt;p&gt;This wasn't what I expected. PPO's clipped surrogate objective is supposed to make better use of each batch through multiple epochs. A2C does a single gradient step per batch and moves on. Yet in practice, A2C converged in ~25k timesteps while PPO needed 50k+ with default Stable Baselines3 hyperparameters.&lt;/p&gt;

&lt;p&gt;The answer lies in what "sample efficiency" actually means for on-policy methods. Spoiler: it's not just about reusing data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-ppo-vs-a2c-cartpole-training-speed-benchmark-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-ppo-vs-a2c-cartpole-training-speed-benchmark-1.jpg" alt="Two young girls performing rhythmic gymnastics with ribbons in an indoor sports hall." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@cottonbro" rel="nofollow noopener noreferrer"&gt;cottonbro studio&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  The Benchmark Setup: CartPole-v1 With Learning Curves
&lt;/h2&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/ppo-vs-a2c-cartpole-training-speed-benchmark/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ppo</category>
      <category>a2c</category>
      <category>gymnasium</category>
      <category>stablebaselines3</category>
    </item>
    <item>
      <title>Git Worktree Race Conditions: 3 Corruptions &amp; Fixes</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Sun, 12 Apr 2026 21:03:44 +0000</pubDate>
      <link>https://forem.com/tildalice/git-worktree-race-conditions-3-corruptions-fixes-5h46</link>
      <guid>https://forem.com/tildalice/git-worktree-race-conditions-3-corruptions-fixes-5h46</guid>
      <description>&lt;h2&gt;
  
  
  The Lockfile That Wasn't There
&lt;/h2&gt;

&lt;p&gt;Git worktrees promise parallel builds without the disk overhead of multiple clones. Run &lt;code&gt;git worktree add ../feature-branch&lt;/code&gt; and you've got a second working directory sharing the same &lt;code&gt;.git&lt;/code&gt; — perfect for testing a hotfix while keeping your main branch clean, or running CI checks without stopping your current work.&lt;/p&gt;

&lt;p&gt;Until both worktrees try to write at the same time.&lt;/p&gt;

&lt;p&gt;I hit this building a CI pipeline that tested multiple branches concurrently. The idea was simple: spin up three worktrees, run &lt;code&gt;pytest&lt;/code&gt; in each, collect results. Worked great locally with sequential runs. In CI with parallel jobs? Random failures with &lt;code&gt;error: could not lock config file .git/config: Resource temporarily unavailable&lt;/code&gt;. Not every run. Just often enough to make the build unreliable.&lt;/p&gt;

&lt;p&gt;The problem isn't obvious from the docs. Git worktrees share more than just objects — they share the index update logic, reflog writes, and config file access. Most of these operations use lockfiles (&lt;code&gt;index.lock&lt;/code&gt;, &lt;code&gt;HEAD.lock&lt;/code&gt;, &lt;code&gt;config.lock&lt;/code&gt;), but the locking is optimistic: Git assumes you're not hammering the same repo from multiple processes simultaneously. Worktrees break that assumption.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-git-worktree-race-conditions-parallel-builds-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-git-worktree-race-conditions-parallel-builds-1.jpg" alt="Close-up of a person holding a Git sticker, emphasizing software development." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/git-worktree-race-conditions-parallel-builds/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>git</category>
      <category>devops</category>
      <category>cicd</category>
      <category>debugging</category>
    </item>
    <item>
      <title>PaddleOCR vs EasyOCR vs Doctr: Memory &amp; Latency Test</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Sun, 12 Apr 2026 18:05:07 +0000</pubDate>
      <link>https://forem.com/tildalice/paddleocr-vs-easyocr-vs-doctr-memory-latency-test-id0</link>
      <guid>https://forem.com/tildalice/paddleocr-vs-easyocr-vs-doctr-memory-latency-test-id0</guid>
      <description>&lt;h2&gt;
  
  
  The 800MB Surprise
&lt;/h2&gt;

&lt;p&gt;I spun up three OCR engines on the same 1000-image dataset and watched htop. PaddleOCR sat at 450MB idle, EasyOCR at 800MB, Doctr at 320MB. That's before a single inference call.&lt;/p&gt;

&lt;p&gt;This isn't about which engine reads text better — I already tested accuracy across 10,000 images. This is about whether your production API stays under the memory limit when 20 concurrent requests hit at once. The difference between an engine that loads in 2 seconds versus 18 seconds isn't academic when you're running serverless functions with cold start penalties.&lt;/p&gt;

&lt;p&gt;I ran each engine through the same gauntlet: initialization time, first inference latency, steady-state memory footprint, and batch processing throughput. The results clarify when each tool makes sense — and when your hosting bill will triple because you picked wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-paddleocr-easyocr-doctr-memory-latency-benchmark-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-paddleocr-easyocr-doctr-memory-latency-benchmark-1.jpg" alt="A back view of a woman kayaking energetically on a lake in Jönköping, Sweden." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@efrem-efre-2786187" rel="nofollow noopener noreferrer"&gt;Efrem  Efre&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  Test Setup: What I Actually Ran
&lt;/h2&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/paddleocr-easyocr-doctr-memory-latency-benchmark/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ocr</category>
      <category>paddleocr</category>
      <category>easyocr</category>
      <category>doctr</category>
    </item>
    <item>
      <title>MoE Router Collapse: Why 90% of Tokens Hit 2 Experts</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Sun, 12 Apr 2026 15:05:14 +0000</pubDate>
      <link>https://forem.com/tildalice/moe-router-collapse-why-90-of-tokens-hit-2-experts-4lp9</link>
      <guid>https://forem.com/tildalice/moe-router-collapse-why-90-of-tokens-hit-2-experts-4lp9</guid>
      <description>&lt;h2&gt;
  
  
  The 8-Expert Model That Only Uses 2
&lt;/h2&gt;

&lt;p&gt;You train a Mixture of Experts model with 8 experts, expecting distributed specialization. After a few thousand steps, you check the routing statistics and find 87% of tokens going to experts 0 and 3. The other six experts? Basically decorative.&lt;/p&gt;

&lt;p&gt;This is router collapse, and it's one of the most frustrating failure modes in MoE training. Your model has 8x the parameters but uses a fraction of them. The paper that first systematically addressed this — Shazeer et al.'s "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" (2017) — remains the foundational reference. You can read it &lt;a href="https://arxiv.org/abs/1701.06538" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The core insight is deceptively simple: without explicit load balancing, routers learn to send everything to whichever experts happen to perform slightly better early in training. Those experts get more gradient signal, improve faster, and attract even more tokens. It's a rich-get-richer dynamic that starves most experts of training signal entirely.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-moe-router-collapse-auxiliary-load-balancing-fix-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-moe-router-collapse-auxiliary-load-balancing-fix-1.jpg" alt="An IT professional configuring network cables in a server rack, focusing on Ethernet connections." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/moe-router-collapse-auxiliary-load-balancing-fix/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>moe</category>
      <category>mixtureofexperts</category>
      <category>routercollapse</category>
      <category>loadbalancing</category>
    </item>
    <item>
      <title>TFLite Inference Fails on Android: 5 ONNX Mobile Fixes</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Sat, 11 Apr 2026 21:04:34 +0000</pubDate>
      <link>https://forem.com/tildalice/tflite-inference-fails-on-android-5-onnx-mobile-fixes-2h1f</link>
      <guid>https://forem.com/tildalice/tflite-inference-fails-on-android-5-onnx-mobile-fixes-2h1f</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Everyone Hits After Successfully Exporting to TFLite
&lt;/h2&gt;

&lt;p&gt;Your TensorFlow model exports to TFLite without errors. The conversion script runs clean. Then you deploy to Android and get &lt;code&gt;IllegalArgumentException: Cannot copy to a TensorFlowLite tensor&lt;/code&gt; or the app just crashes with a cryptic JNI error.&lt;/p&gt;

&lt;p&gt;I've seen this pattern repeat across three production deployments: the model works perfectly in Python, passes TFLite validation, then fails spectacularly on actual devices. The issue isn't your model architecture — it's the mismatch between what TFLite expects and what mobile runtimes actually support.&lt;/p&gt;

&lt;p&gt;ONNX Runtime Mobile solves most of these problems by design, but the migration isn't obvious. Here are the five fixes that actually work, with before/after code and the specific error messages they eliminate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-tflite-inference-fails-android-onnx-mobile-fixes-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-tflite-inference-fails-android-onnx-mobile-fixes-1.jpg" alt="Close-up of a hand holding a smartphone displaying Android 11 interface indoors on patterned floor." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@zaktech90" rel="nofollow noopener noreferrer"&gt;Zain Ali&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  Fix 1: Dynamic Shape Errors (Cannot Resize Tensor)
&lt;/h2&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/tflite-inference-fails-android-onnx-mobile-fixes/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>onnxruntime</category>
      <category>tensorflowlite</category>
      <category>androidml</category>
      <category>edgeai</category>
    </item>
    <item>
      <title>ONNX INT8 vs FP16: 3x Latency Drop on Jetson Orin Nano</title>
      <dc:creator>TildAlice</dc:creator>
      <pubDate>Sat, 11 Apr 2026 18:04:20 +0000</pubDate>
      <link>https://forem.com/tildalice/onnx-int8-vs-fp16-3x-latency-drop-on-jetson-orin-nano-bb1</link>
      <guid>https://forem.com/tildalice/onnx-int8-vs-fp16-3x-latency-drop-on-jetson-orin-nano-bb1</guid>
      <description>&lt;h2&gt;
  
  
  Switching from FP16 to INT8 cut our object detection pipeline from 47ms to 15ms per frame on the Jetson Orin Nano
&lt;/h2&gt;

&lt;p&gt;That's the kind of speedup that transforms a barely-real-time demo into a production-ready edge AI system. But here's the catch: the accuracy drop wasn't uniform across model architectures. ResNet-based models handled quantization gracefully ($&amp;lt;$2% mAP loss), while MobileNet variants occasionally spiked false positives by 14% on small objects.&lt;/p&gt;

&lt;p&gt;I ran this benchmark because most ONNX quantization guides stop at "it's faster" without showing you &lt;em&gt;where&lt;/em&gt; it breaks. If you're shipping inference on Jetson devices, you need to know the exact tradeoff curve — not just average numbers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-onnx-int8-vs-fp16-jetson-orin-nano-latency-benchmark-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftildalice.io%2Fwp-content%2Fuploads%2F2026%2F04%2Fstock-onnx-int8-vs-fp16-jetson-orin-nano-latency-benchmark-1.jpg" alt="Macro shot of a computer part showcasing intricate electronic connections." width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by &lt;a href="https://www.pexels.com/@sejio402" rel="nofollow noopener noreferrer"&gt;Sergei Starostin&lt;/a&gt; on &lt;a href="https://www.pexels.com" rel="nofollow noopener noreferrer"&gt;Pexels&lt;/a&gt;



&lt;h2&gt;
  
  
  The Hardware Baseline: Why Jetson Orin Nano INT8 Performance Matters
&lt;/h2&gt;

&lt;p&gt;The Jetson Orin Nano packs 1024 CUDA cores and 32 Tensor Cores into a 15W power envelope. NVIDIA markets it as an "AI at the edge" platform, but the real question is: which precision mode actually delivers?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Continue reading the full article on &lt;a href="https://tildalice.io/onnx-int8-vs-fp16-jetson-orin-nano-latency-benchmark/" rel="noopener noreferrer"&gt;TildAlice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>onnx</category>
      <category>jetson</category>
      <category>int8</category>
      <category>modelquantization</category>
    </item>
  </channel>
</rss>
