<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: YMori</title>
    <description>The latest articles on Forem by YMori (@yasumorishima).</description>
    <link>https://forem.com/yasumorishima</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3761588%2F10324d86-be69-483f-90af-7c8f8eb80cf1.png</url>
      <title>Forem: YMori</title>
      <link>https://forem.com/yasumorishima</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yasumorishima"/>
    <language>en</language>
    <item>
      <title>transform: translateY(0) Breaks position: fixed — A Hidden Trap in SPA Animations</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Mon, 06 Apr 2026 09:55:09 +0000</pubDate>
      <link>https://forem.com/yasumorishima/transform-translatey0-breaks-position-fixed-a-hidden-trap-in-spa-animations-2hed</link>
      <guid>https://forem.com/yasumorishima/transform-translatey0-breaks-position-fixed-a-hidden-trap-in-spa-animations-2hed</guid>
      <description>&lt;h2&gt;
  
  
  The Bug
&lt;/h2&gt;

&lt;p&gt;One day I got this bug report on my Next.js site:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Clicking a photo near the bottom of the gallery opens a lightbox, but it's completely black. Scroll up and the image is there.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A &lt;code&gt;position: fixed; inset: 0&lt;/code&gt; overlay was not covering the viewport — it was stuck at the top of the page. Browser bug? No. This is CSS working exactly as specified.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Reproduce
&lt;/h2&gt;

&lt;p&gt;Two ingredients:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;An ancestor element with &lt;code&gt;transform&lt;/code&gt; set&lt;/strong&gt; (even &lt;code&gt;translateY(0)&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A descendant with &lt;code&gt;position: fixed&lt;/code&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="c"&gt;/* Page transition animation */&lt;/span&gt;
&lt;span class="k"&gt;@keyframes&lt;/span&gt; &lt;span class="n"&gt;page-enter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nt"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;translateY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;12px&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nt"&gt;to&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;translateY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c"&gt;/* The culprit */&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;.page-enter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page-enter&lt;/span&gt; &lt;span class="m"&gt;0.35s&lt;/span&gt; &lt;span class="n"&gt;ease&lt;/span&gt; &lt;span class="nb"&gt;both&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;/* both = keeps final values */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Lightbox (descendant of .page-enter)&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"fixed inset-0 z-50 bg-black/90"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Near the top of the page, everything looks fine. Scroll down and open the lightbox — it renders at the &lt;strong&gt;top of the ancestor element&lt;/strong&gt;, not the viewport.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Happens — The CSS Spec
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://developer.mozilla.org/en-US/docs/Web/CSS/position#fixed" rel="noopener noreferrer"&gt;MDN's &lt;code&gt;position: fixed&lt;/code&gt; documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The element is positioned relative to the initial containing block established by the viewport, &lt;strong&gt;except when one of its ancestors has a &lt;code&gt;transform&lt;/code&gt;, &lt;code&gt;perspective&lt;/code&gt;, or &lt;code&gt;filter&lt;/code&gt; property set to something other than &lt;code&gt;none&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Ancestor's &lt;code&gt;transform&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;fixed&lt;/code&gt; is relative to&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;none&lt;/code&gt; or unset&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;viewport&lt;/strong&gt; (expected)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;translateY(0)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;that ancestor&lt;/strong&gt; (broken)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;translateY(12px)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;that ancestor&lt;/strong&gt; (broken)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;translateY(0)&lt;/code&gt; is not the same as no transform.&lt;/strong&gt; It's a transform that moves nothing — but the CSS engine still creates a new containing block.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;animation-fill-mode: both&lt;/code&gt; Trap
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nc"&gt;.page-enter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page-enter&lt;/span&gt; &lt;span class="m"&gt;0.35s&lt;/span&gt; &lt;span class="n"&gt;ease&lt;/span&gt; &lt;span class="nb"&gt;both&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;both&lt;/code&gt; (&lt;code&gt;forwards&lt;/code&gt; + &lt;code&gt;backwards&lt;/code&gt;) keeps the final keyframe values &lt;strong&gt;after the animation ends&lt;/strong&gt;. So &lt;code&gt;transform: translateY(0)&lt;/code&gt; persists for the lifetime of the element.&lt;/p&gt;

&lt;p&gt;The same applies to JavaScript inline styles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// IntersectionObserver fadeIn component&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;visible&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;translateY(0)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;translateY(16px)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// After visible=true, translateY(0) stays forever&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Blast Radius
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Every &lt;code&gt;fixed&lt;/code&gt; descendant&lt;/strong&gt; of a &lt;code&gt;transform&lt;/code&gt;-bearing ancestor is affected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lightboxes / modals&lt;/li&gt;
&lt;li&gt;Toast notifications&lt;/li&gt;
&lt;li&gt;Cookie consent banners&lt;/li&gt;
&lt;li&gt;PWA install prompts&lt;/li&gt;
&lt;li&gt;Progress bars&lt;/li&gt;
&lt;li&gt;Scroll-to-top buttons&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom navs and sticky headers may not visibly break (they sit at viewport edges), but they are technically affected too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use &lt;code&gt;transform: none&lt;/code&gt; (Most Important)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="k"&gt;@keyframes&lt;/span&gt; &lt;span class="n"&gt;page-enter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nt"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;translateY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;12px&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nt"&gt;to&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;none&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;/* Not translateY(0) */&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;visible&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;none&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;translateY(16px)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;transform: none&lt;/code&gt; means "no transform is applied" — no containing block is created.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use &lt;code&gt;createPortal&lt;/code&gt; to Escape the DOM Tree (Defensive)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createPortal&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react-dom&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Lightbox&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;createPortal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"fixed inset-0 z-50 bg-black/90"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="cm"&gt;/* ... */&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="c1"&gt;// Renders at body root — immune to ancestor CSS&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No matter what ancestors do, the overlay is not affected. This is a best practice for any viewport-covering overlay.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Do Both (Recommended)
&lt;/h3&gt;

&lt;p&gt;Fix the root cause with &lt;code&gt;transform: none&lt;/code&gt;, and add &lt;code&gt;createPortal&lt;/code&gt; as defense-in-depth. If someone later adds a new &lt;code&gt;transform&lt;/code&gt; ancestor, overlays still work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Don't&lt;/th&gt;
&lt;th&gt;Do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Use &lt;code&gt;translateY(0)&lt;/code&gt; as animation end value&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;transform: none&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Render &lt;code&gt;fixed&lt;/code&gt; overlays deep in the DOM tree&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;createPortal(document.body)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add animations without checking &lt;code&gt;fixed&lt;/code&gt; elements&lt;/td&gt;
&lt;td&gt;Audit &lt;code&gt;fixed&lt;/code&gt; descendants when adding &lt;code&gt;transform&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;translateY(0)&lt;/code&gt; and &lt;code&gt;none&lt;/code&gt; look identical but behave differently.&lt;/strong&gt; Miss this spec detail and every overlay on your site breaks the moment you add a page transition animation.&lt;/p&gt;

</description>
      <category>css</category>
      <category>nextjs</category>
      <category>react</category>
      <category>webdev</category>
    </item>
    <item>
      <title>NPB 2021 Backtest: Could a Bayesian Model Predict Last-Place-to-Champion?</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Tue, 24 Mar 2026 02:51:07 +0000</pubDate>
      <link>https://forem.com/yasumorishima/npb-2021-backtest-could-a-bayesian-model-predict-last-place-to-champion-3kl2</link>
      <guid>https://forem.com/yasumorishima/npb-2021-backtest-could-a-bayesian-model-predict-last-place-to-champion-3kl2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In a &lt;a href="https://dev.to/yasumorishima/adding-bayesian-ensemble-monte-carlo-to-an-npb-prediction-app-58po"&gt;previous article&lt;/a&gt;, I added Bayesian integration to my NPB prediction system. The 8-year backtest showed "97% probability of beating Marcel." But how did it perform in the &lt;strong&gt;worst year for predictions&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;2021 was NPB's biggest upset: both &lt;strong&gt;Yakult (CL)&lt;/strong&gt; and &lt;strong&gt;Orix (PL)&lt;/strong&gt; went from last place to champions. I ran a full backtest with &lt;strong&gt;25 new foreign players individually projected&lt;/strong&gt; using FanGraphs and Baseball Savant data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-2021-backtest" rel="noopener noreferrer"&gt;npb-2021-backtest&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Main model&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;npb-prediction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Team Standings: Predicted vs Actual
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Central League
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Actual&lt;/th&gt;
&lt;th&gt;Bayes (no foreign)&lt;/th&gt;
&lt;th&gt;Bayes (with foreign)&lt;/th&gt;
&lt;th&gt;Foreign Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yakult&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;73W (1st)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;69.5W (4th)&lt;/td&gt;
&lt;td&gt;70.7W (4th)&lt;/td&gt;
&lt;td&gt;+1.2W (Santana, Osuna)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hanshin&lt;/td&gt;
&lt;td&gt;77W (2nd)&lt;/td&gt;
&lt;td&gt;72.8W (2nd)&lt;/td&gt;
&lt;td&gt;72.6W (2nd)&lt;/td&gt;
&lt;td&gt;-0.2W&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Giants&lt;/td&gt;
&lt;td&gt;61W (3rd)&lt;/td&gt;
&lt;td&gt;83.1W (1st)&lt;/td&gt;
&lt;td&gt;84.3W (1st)&lt;/td&gt;
&lt;td&gt;+1.2W (Smoak, Thames)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pacific League
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Actual&lt;/th&gt;
&lt;th&gt;Bayes (no foreign)&lt;/th&gt;
&lt;th&gt;Bayes (with foreign)&lt;/th&gt;
&lt;th&gt;Foreign Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Orix&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;70W (1st)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;64.5W (6th)&lt;/td&gt;
&lt;td&gt;62.0W (6th)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-2.5W (worse)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SoftBank&lt;/td&gt;
&lt;td&gt;60W (4th)&lt;/td&gt;
&lt;td&gt;77.6W (1st)&lt;/td&gt;
&lt;td&gt;76.2W (1st)&lt;/td&gt;
&lt;td&gt;-1.4W&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;MAE: 10.4 wins → 10.7 wins.&lt;/strong&gt; Foreign player predictions slightly worsened accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Foreign Player Predictions vs Actual
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Accurate Predictions (average MLB players)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Pred OPS&lt;/th&gt;
&lt;th&gt;Actual OPS&lt;/th&gt;
&lt;th&gt;Diff&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Kevin Cron&lt;/td&gt;
&lt;td&gt;Carp&lt;/td&gt;
&lt;td&gt;.703&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.701&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-.002&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jose Osuna&lt;/td&gt;
&lt;td&gt;Swallows&lt;/td&gt;
&lt;td&gt;.683&lt;/td&gt;
&lt;td&gt;.694&lt;/td&gt;
&lt;td&gt;+.011&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cy Sneed&lt;/td&gt;
&lt;td&gt;Swallows&lt;/td&gt;
&lt;td&gt;ERA 3.53&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;ERA 3.41&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-0.12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Major Misses (extreme players)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Pred OPS&lt;/th&gt;
&lt;th&gt;Actual OPS&lt;/th&gt;
&lt;th&gt;Diff&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mike Gerber&lt;/td&gt;
&lt;td&gt;Dragons&lt;/td&gt;
&lt;td&gt;.862&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.352&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-.510&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mel Rojas Jr.&lt;/td&gt;
&lt;td&gt;Tigers&lt;/td&gt;
&lt;td&gt;.867&lt;/td&gt;
&lt;td&gt;.663&lt;/td&gt;
&lt;td&gt;-.204&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domingo Santana&lt;/td&gt;
&lt;td&gt;Swallows&lt;/td&gt;
&lt;td&gt;.713&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.877&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+.164&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Gerber&lt;/strong&gt; had an MLB wOBA of .127 (49.3% K rate) — the model over-regressed toward the mean, predicting .862 OPS when the actual was .352.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Santana&lt;/strong&gt; was predicted from 84 PA in 2020 (COVID-shortened). His career .757 OPS would have been more predictive.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Drove the 2021 Standings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Yakult's Championship Run
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;2020&lt;/th&gt;
&lt;th&gt;2021&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tetsuto Yamada&lt;/td&gt;
&lt;td&gt;OPS .766&lt;/td&gt;
&lt;td&gt;OPS .885&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+.119&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domingo Santana&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;OPS .877&lt;/td&gt;
&lt;td&gt;New signing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noboru Shimizu&lt;/td&gt;
&lt;td&gt;ERA 3.54&lt;/td&gt;
&lt;td&gt;ERA 2.39&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-1.15&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Orix's Championship Run
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;2020&lt;/th&gt;
&lt;th&gt;2021&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Yutaro Sugimoto&lt;/td&gt;
&lt;td&gt;OPS .695&lt;/td&gt;
&lt;td&gt;OPS .931&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;+.236&lt;/strong&gt; (HR King at 31)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hiroya Miyagi&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;ERA 2.51 (147IP)&lt;/td&gt;
&lt;td&gt;20-year-old, 13 wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yoshinobu Yamamoto&lt;/td&gt;
&lt;td&gt;ERA 2.20&lt;/td&gt;
&lt;td&gt;ERA 1.39&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;-0.81&lt;/strong&gt; (Sawamura Award)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Sugimoto and Miyagi's breakouts were impossible to predict from past data.&lt;/strong&gt; This is a structural change, not a statistical fluctuation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Giants Collapse (Predicted 84.3W → Actual 61W)
&lt;/h3&gt;

&lt;p&gt;Sugano (ERA 1.97→3.19), Sakamoto (OPS .844→.657), Maru (OPS .899→.775) — three stars declining simultaneously. The Bayesian model &lt;strong&gt;trusted their skill metrics&lt;/strong&gt; and predicted even higher than Marcel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Findings
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Average MLB players predicted well&lt;/strong&gt; (Cron .703 vs .701 actual)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extreme players over-regressed&lt;/strong&gt; (Gerber .862 vs .352) → need regression limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-year small samples mislead&lt;/strong&gt; (Santana's 84 PA in 2020) → use career stats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bad MLB pitchers stay bad in NPB&lt;/strong&gt; (Sparkman 6.02→3.88 pred→6.88 actual)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2021 was driven by Japanese player breakouts&lt;/strong&gt;, not foreign players&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Individual foreign player projections improve accuracy for "average" players but carry risk for extreme cases. In 2021, Japanese player breakouts and collapses determined the standings — foreign player predictions had minimal impact (MAE +0.3 wins).&lt;/p&gt;

&lt;p&gt;This is a personal hobby project. There may be oversights in data collection and verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://baseball-data.com" rel="noopener noreferrer"&gt;Baseball Data Freak&lt;/a&gt; — NPB player stats&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://npb.jp" rel="noopener noreferrer"&gt;NPB Official&lt;/a&gt; — Official records&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.fangraphs.com" rel="noopener noreferrer"&gt;FanGraphs&lt;/a&gt; — MLB wOBA/K%/BB%&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://baseballsavant.mlb.com" rel="noopener noreferrer"&gt;Baseball Savant&lt;/a&gt; — MLB Statcast&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.baseball-reference.com" rel="noopener noreferrer"&gt;Baseball Reference&lt;/a&gt; — MLB/MiLB stats&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>bayesian</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Adding Bayesian Ensemble + Monte Carlo to an NPB Prediction App</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Mon, 23 Mar 2026 21:48:12 +0000</pubDate>
      <link>https://forem.com/yasumorishima/adding-bayesian-ensemble-monte-carlo-to-an-npb-prediction-app-58po</link>
      <guid>https://forem.com/yasumorishima/adding-bayesian-ensemble-monte-carlo-to-an-npb-prediction-app-58po</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I've been running a personal NPB (Japanese pro baseball) prediction app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard&lt;/strong&gt;: &lt;a href="https://npb-prediction.streamlit.app/" rel="noopener noreferrer"&gt;npb-prediction.streamlit.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;npb-prediction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It used Marcel projections (3-year weighted average) and ML (XGBoost/LightGBM). Decent, but I wanted better accuracy. After adding Bayesian corrections, the predicted standings changed significantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terms
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Marcel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Predict next year from weighted average of past 3 years&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bayesian&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Combine prior knowledge with data. Gives uncertainty estimates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Credible interval — range where the true value falls with 80%/95% probability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OPS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On-base + Slugging. Overall batting metric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ERA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Earned Run Average. Runs allowed per 9 innings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MAE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mean Absolute Error. Average prediction miss. Lower = better&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Problems with the Previous Approach
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Problem 1: All Foreign Players Treated as "Average"
&lt;/h3&gt;

&lt;p&gt;Marcel needs 3 years of NPB data. First-year foreign players have none, so all 24 of them were treated as league-average. Dalbec (Giants, .355 wOBA in MLB) and Hummel (BayStars, .240 wOBA) were calculated identically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: Skill Metrics Ignored
&lt;/h3&gt;

&lt;p&gt;Marcel averages past results directly. Two players with OPS .800 might have very different K% and BB% profiles, which affects how stable their performance will be next year.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 3: No Uncertainty
&lt;/h3&gt;

&lt;p&gt;"Maki's OPS: .812" gives no sense of how much it might vary. The difference between .750-.870 and .790-.830 matters a lot for team projections.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed with Bayesian Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Foreign Players: Average → Individual Predictions
&lt;/h3&gt;

&lt;p&gt;Built a model to convert MLB/KBO stats to NPB projections. For example, a .350 wOBA MLB hitter maps to approximately &lt;code&gt;.350 × 1.235 = .432&lt;/code&gt; NPB-equivalent wOBA.&lt;/p&gt;

&lt;p&gt;All 24 players' names and prior-league stats were individually web-verified (guessing English names from katakana is surprisingly error-prone).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foreign hitter examples:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Prior wOBA&lt;/th&gt;
&lt;th&gt;NPB Pred OPS&lt;/th&gt;
&lt;th&gt;80% CI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sano&lt;/td&gt;
&lt;td&gt;Dragons&lt;/td&gt;
&lt;td&gt;.370&lt;/td&gt;
&lt;td&gt;.760&lt;/td&gt;
&lt;td&gt;.632–.889&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Seymour&lt;/td&gt;
&lt;td&gt;Buffaloes&lt;/td&gt;
&lt;td&gt;.365&lt;/td&gt;
&lt;td&gt;.735&lt;/td&gt;
&lt;td&gt;.607–.863&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dalbec&lt;/td&gt;
&lt;td&gt;Giants&lt;/td&gt;
&lt;td&gt;.355&lt;/td&gt;
&lt;td&gt;.725&lt;/td&gt;
&lt;td&gt;.577–.884&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hummel&lt;/td&gt;
&lt;td&gt;BayStars&lt;/td&gt;
&lt;td&gt;.240&lt;/td&gt;
&lt;td&gt;.694&lt;/td&gt;
&lt;td&gt;.530–.849&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Foreign pitcher examples:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Prior ERA&lt;/th&gt;
&lt;th&gt;NPB Pred ERA&lt;/th&gt;
&lt;th&gt;80% CI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Quijada&lt;/td&gt;
&lt;td&gt;Swallows&lt;/td&gt;
&lt;td&gt;3.26&lt;/td&gt;
&lt;td&gt;2.76&lt;/td&gt;
&lt;td&gt;1.28–4.24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hjelle&lt;/td&gt;
&lt;td&gt;Buffaloes&lt;/td&gt;
&lt;td&gt;3.90&lt;/td&gt;
&lt;td&gt;3.34&lt;/td&gt;
&lt;td&gt;1.05–5.59&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cox&lt;/td&gt;
&lt;td&gt;BayStars&lt;/td&gt;
&lt;td&gt;8.86&lt;/td&gt;
&lt;td&gt;3.36&lt;/td&gt;
&lt;td&gt;1.82–4.85&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Players with poor prior-league stats get pulled toward league average (Bayesian regression effect), but with wider CIs = lower confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Japanese Players: K%/BB%/BABIP Corrections
&lt;/h3&gt;

&lt;p&gt;Three models combined into a final prediction:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Marcel&lt;/td&gt;
&lt;td&gt;35%&lt;/td&gt;
&lt;td&gt;Strong baseline, especially for pitcher ERA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bayesian correction&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;td&gt;K%/BB%/BABIP/age adjustment on top of Marcel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ML&lt;/td&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;td&gt;XGBoost/LightGBM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Did Accuracy Improve?
&lt;/h2&gt;

&lt;p&gt;8-year backtest (2018–2025, predict each year and compare to actual):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Marcel MAE&lt;/th&gt;
&lt;th&gt;Bayesian MAE&lt;/th&gt;
&lt;th&gt;Improvement prob.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hitter wOBA&lt;/td&gt;
&lt;td&gt;0.05023&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.04980&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;97.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pitcher ERA&lt;/td&gt;
&lt;td&gt;1.23008&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.22241&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;97.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Small improvement, but consistent — &lt;strong&gt;97% probability of beating Marcel across 8 years&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Historical Marcel Accuracy for Context
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Overall (8 years × 12 teams = 96 team-years):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Wins MAE&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.4 wins&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg rank error&lt;/td&gt;
&lt;td&gt;1.42 positions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exact rank rate&lt;/td&gt;
&lt;td&gt;18%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Within 1 rank&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Recent examples of Marcel misses:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Actual&lt;/th&gt;
&lt;th&gt;Predicted&lt;/th&gt;
&lt;th&gt;Miss&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;Swallows (CL)&lt;/td&gt;
&lt;td&gt;57W (6th)&lt;/td&gt;
&lt;td&gt;72W (4th)&lt;/td&gt;
&lt;td&gt;+15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;SoftBank (PL)&lt;/td&gt;
&lt;td&gt;91W (1st)&lt;/td&gt;
&lt;td&gt;75W (2nd)&lt;/td&gt;
&lt;td&gt;-16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;Buffaloes (PL)&lt;/td&gt;
&lt;td&gt;63W (5th)&lt;/td&gt;
&lt;td&gt;78W (1st)&lt;/td&gt;
&lt;td&gt;+15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Patterns:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overestimates bottom teams, underestimates top teams (regression to mean)&lt;/li&gt;
&lt;li&gt;Can't predict collapses (2024 Buffaloes: defending champions → 5th place)&lt;/li&gt;
&lt;li&gt;Foreign player impact not captured when all treated as average&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Did the 2026 Standings Change?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Central League — Tigers Runaway Disappears, 4-Team Deadlock
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Marcel&lt;/th&gt;
&lt;th&gt;Bayesian&lt;/th&gt;
&lt;th&gt;Diff&lt;/th&gt;
&lt;th&gt;P(Pennant)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tigers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80.1W (1st)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;71.5W (1st)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-8.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;26.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Giants&lt;/td&gt;
&lt;td&gt;70.7W (3rd)&lt;/td&gt;
&lt;td&gt;71.1W (2nd)&lt;/td&gt;
&lt;td&gt;+0.4&lt;/td&gt;
&lt;td&gt;20.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dragons&lt;/td&gt;
&lt;td&gt;68.8W (5th)&lt;/td&gt;
&lt;td&gt;71.0W (3rd)&lt;/td&gt;
&lt;td&gt;+2.2&lt;/td&gt;
&lt;td&gt;21.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BayStars&lt;/td&gt;
&lt;td&gt;71.3W (2nd)&lt;/td&gt;
&lt;td&gt;70.7W (4th)&lt;/td&gt;
&lt;td&gt;-0.6&lt;/td&gt;
&lt;td&gt;20.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Carp&lt;/td&gt;
&lt;td&gt;70.4W (4th)&lt;/td&gt;
&lt;td&gt;69.1W (5th)&lt;/td&gt;
&lt;td&gt;-1.3&lt;/td&gt;
&lt;td&gt;12.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swallows&lt;/td&gt;
&lt;td&gt;64.3W (6th)&lt;/td&gt;
&lt;td&gt;61.2W (6th)&lt;/td&gt;
&lt;td&gt;-3.1&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Tigers dropped from 80.1W to 71.5W (-8.6).&lt;/strong&gt; Skill corrections pulled them down. Giants at 71.1W even after losing Okamoto to MLB. &lt;strong&gt;Four teams within 0.8 wins&lt;/strong&gt; — Tigers 26%, Dragons 21%, Giants 20%, BayStars 20%. Swallows at 61.2W (78% last place) after Murakami's MLB departure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pacific League — Lions Surge
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Marcel&lt;/th&gt;
&lt;th&gt;Bayesian&lt;/th&gt;
&lt;th&gt;Diff&lt;/th&gt;
&lt;th&gt;P(Pennant)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hawks&lt;/td&gt;
&lt;td&gt;80.5W (1st)&lt;/td&gt;
&lt;td&gt;81.3W (1st)&lt;/td&gt;
&lt;td&gt;+0.8&lt;/td&gt;
&lt;td&gt;47.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fighters&lt;/td&gt;
&lt;td&gt;76.8W (2nd)&lt;/td&gt;
&lt;td&gt;79.1W (2nd)&lt;/td&gt;
&lt;td&gt;+2.3&lt;/td&gt;
&lt;td&gt;27.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Buffaloes&lt;/td&gt;
&lt;td&gt;73.8W (3rd)&lt;/td&gt;
&lt;td&gt;77.5W (3rd)&lt;/td&gt;
&lt;td&gt;+3.7&lt;/td&gt;
&lt;td&gt;17.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lions&lt;/td&gt;
&lt;td&gt;68.6W (4th)&lt;/td&gt;
&lt;td&gt;74.9W (4th)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+6.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eagles&lt;/td&gt;
&lt;td&gt;65.5W (5th)&lt;/td&gt;
&lt;td&gt;66.7W (5th)&lt;/td&gt;
&lt;td&gt;+1.2&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marines&lt;/td&gt;
&lt;td&gt;67.1W (6th)&lt;/td&gt;
&lt;td&gt;64.9W (6th)&lt;/td&gt;
&lt;td&gt;-2.2&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Lions +6.3 wins&lt;/strong&gt; — foreign player projections offsetting Imai's MLB departure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Foreign players&lt;/td&gt;
&lt;td&gt;All league-average&lt;/td&gt;
&lt;td&gt;24 individual projections from prior-league stats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill metrics&lt;/td&gt;
&lt;td&gt;Not used&lt;/td&gt;
&lt;td&gt;K%/BB%/BABIP corrections on Marcel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uncertainty&lt;/td&gt;
&lt;td&gt;None (point estimates)&lt;/td&gt;
&lt;td&gt;80%/95% credible intervals on every prediction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team standings&lt;/td&gt;
&lt;td&gt;Single number&lt;/td&gt;
&lt;td&gt;10,000 Monte Carlo sims with pennant probabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Marcel MAE 0.050&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.0498&lt;/strong&gt; (97% probability of improvement)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The accuracy gain is modest, but "foreign players are no longer invisible," "MLB departures are reflected," and "every prediction comes with uncertainty" meaningfully changed the standings picture. The CL went from "Tigers runaway" to a four-team deadlock.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveat: Data Limitations
&lt;/h3&gt;

&lt;p&gt;During this work, I discovered that &lt;strong&gt;players who moved to MLB (Murakami, Okamoto)&lt;/strong&gt; were still included in the team simulation — the roster filter only existed in the Streamlit display layer, not in the CSV generation pipeline. Fixed and regenerated, but &lt;strong&gt;there may be other oversights I haven't caught.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is a personal project without professional-grade QA. The data is best treated as automated model output, not authoritative predictions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard&lt;/strong&gt;: &lt;a href="https://npb-prediction.streamlit.app/" rel="noopener noreferrer"&gt;npb-prediction.streamlit.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;github.com/yasumorishima/npb-prediction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://baseball-data.com" rel="noopener noreferrer"&gt;Baseball Data Freak&lt;/a&gt; — NPB player stats&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://npb.jp" rel="noopener noreferrer"&gt;NPB Official&lt;/a&gt; — Official records&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>bayesian</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Adding Bayesian Ensemble + Monte Carlo to an NPB Prediction System</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Mon, 23 Mar 2026 20:55:48 +0000</pubDate>
      <link>https://forem.com/yasumorishima/adding-bayesian-ensemble-monte-carlo-to-an-npb-prediction-system-2fl1</link>
      <guid>https://forem.com/yasumorishima/adding-bayesian-ensemble-monte-carlo-to-an-npb-prediction-system-2fl1</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In a previous article, I documented my journey adding Bayesian regression (Stan/Ridge) to my NPB (Japanese pro baseball) prediction system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Previous article&lt;/strong&gt;: &lt;a href="https://dev.to/shogaku/beyond-marcel-adding-bayesian-regression-to-npb-baseball-predictions-a-15-step-journey-37a0"&gt;Beyond Marcel: Adding Bayesian Regression to NPB Predictions&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That work lived in a separate experiment repository (&lt;a href="https://github.com/yasumorishima/npb-bayes-projection" rel="noopener noreferrer"&gt;npb-bayes-projection&lt;/a&gt;). This article covers adding those pieces into the main app — a 7-phase process that touched 19 files and added 4,087 lines.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;npb-prediction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live dashboard&lt;/strong&gt;: &lt;a href="https://npb-prediction.streamlit.app/" rel="noopener noreferrer"&gt;npb-prediction.streamlit.app&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Before: Point Estimates Only
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Marcel (3-year weighted avg) → ML (XGBoost/LightGBM)
    ↓                              ↓
  Point estimate               Point estimate
    ↓
Pythagorean Win% → Team standings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No uncertainty quantification&lt;/li&gt;
&lt;li&gt;24 new foreign players treated as league-average (wRAA=0)&lt;/li&gt;
&lt;li&gt;Marcel and ML run independently — no ensemble&lt;/li&gt;
&lt;li&gt;Team standings are a single number with no confidence interval&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  After: Bayesian Ensemble + Monte Carlo
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 1: Marcel (unchanged)
    ↓
Layer 2: Stan Bayesian correction
  - Japanese: Ridge correction via K%/BB%/BABIP/age
  - Foreign: Prior-league stats × league-specific conversion (Stan v2)
    ↓
Layer 3: ML (XGBoost/LightGBM)
    ↓
Layer 4: BMA (Bayesian Model Averaging)
  - Marcel 35% + Stan 40% + ML 25%
  - 80%/95% credible intervals on every prediction
    ↓
Monte Carlo 10,000 draws → Team win distributions
  - P(pennant) / P(Climax Series) / P(last place)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The 7 Phases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Japanese Player Bayesian Inference
&lt;/h3&gt;

&lt;p&gt;The key design decision: &lt;strong&gt;Stan does not run at inference time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;cmdstanpy is heavy to install and won't fit on a Raspberry Pi 5 (4GB RAM). Instead, I pre-compute posterior parameters into &lt;code&gt;posteriors.json&lt;/code&gt; during training (in GitHub Actions), then sample with NumPy at runtime.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# posteriors.json structure (hitter example)
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;japanese_hitter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.152&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.089&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.245&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.003&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sigma_residual&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.06215&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feature_names&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;K_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BB_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BABIP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;age_from_peak&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Runtime sampling (milliseconds, not seconds)
&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;scaler_mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;scaler_std&lt;/span&gt;
&lt;span class="n"&gt;correction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;beta&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;
&lt;span class="n"&gt;samples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;marcel_value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;correction&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ci_80&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 2: Foreign Player Stan v2 Predictions
&lt;/h3&gt;

&lt;p&gt;The most labor-intensive phase. I had to web-verify all 24 foreign players individually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Katakana name → correct English name&lt;/li&gt;
&lt;li&gt;Origin league (MLB / KBO / independent)&lt;/li&gt;
&lt;li&gt;Most recent season stats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned: Never guess English names from katakana.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Over 10 of my initial 28 guesses were wrong:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;NPB Name&lt;/th&gt;
&lt;th&gt;Initial Guess&lt;/th&gt;
&lt;th&gt;Correct&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dalbec&lt;/td&gt;
&lt;td&gt;Spencer Torkelson&lt;/td&gt;
&lt;td&gt;Bobby Dalbec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jerry&lt;/td&gt;
&lt;td&gt;Sean Gerry&lt;/td&gt;
&lt;td&gt;Sean Hjelle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lucas&lt;/td&gt;
&lt;td&gt;Josh Lucas&lt;/td&gt;
&lt;td&gt;Easton Lucas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I also misidentified 4 Japanese draft picks (with katakana names) as foreign players. The rule: &lt;strong&gt;verify every single entry via web search before committing.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Monte Carlo Team Simulation
&lt;/h3&gt;

&lt;p&gt;Player-level uncertainty propagates to team-level through 10,000 independent simulations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sim&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;teams&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sample_hitter_runs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hitters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ra&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sample_pitcher_runs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pitchers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ra&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;apply_park_factor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ra&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;wins&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;sim&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;143&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mf"&gt;1.83&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mf"&gt;1.83&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ra&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mf"&gt;1.83&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Foreign players get 1.5x sigma (wider uncertainty since they have no NPB data).&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5: API Integration
&lt;/h3&gt;

&lt;p&gt;Three new FastAPI endpoints:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/predict/hitter/{name}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bayesian OPS + 80%/95% CI (added to existing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/predict/foreign/{name}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Foreign player Stan v2 projections (new)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/standings/simulation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Monte Carlo team standings (new)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 6: Streamlit Integration
&lt;/h3&gt;

&lt;p&gt;The largest phase — added ~370 lines to the 1,669-line &lt;code&gt;streamlit_app.py&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Bayesian CI bars&lt;/strong&gt; on existing prediction pages (Plotly overlay bars for 80%/95% intervals)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team Simulation page&lt;/strong&gt; (new) — fan chart + probability table&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foreign Players page&lt;/strong&gt; (new) — prior-league stats + NPB projection with CI&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 7: BigQuery Integration
&lt;/h3&gt;

&lt;p&gt;Added 8 tables (25 → 33 total): Bayesian predictions, foreign player data, simulation results, and conversion factors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  posteriors.json vs. cmdstanpy at runtime
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;posteriors.json&lt;/th&gt;
&lt;th&gt;cmdstanpy runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inference speed&lt;/td&gt;
&lt;td&gt;NumPy only (ms)&lt;/td&gt;
&lt;td&gt;Stan call (seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Few KB&lt;/td&gt;
&lt;td&gt;Hundreds of MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Updates&lt;/td&gt;
&lt;td&gt;Annual retraining via GitHub Actions&lt;/td&gt;
&lt;td&gt;Fit every time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a system running on RPi5 with 4GB RAM, this was the only viable option. With annual data updates, there's no need to re-fit on every request.&lt;/p&gt;

&lt;h3&gt;
  
  
  BMA Weight Rationale
&lt;/h3&gt;

&lt;p&gt;Marcel 35% + Stan 40% + ML 25% was determined by 8-year LOO-CV:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stan correction improved Marcel 97.1% of the time (bootstrap)&lt;/li&gt;
&lt;li&gt;ML matched Marcel on hitter OPS but underperformed on pitcher ERA&lt;/li&gt;
&lt;li&gt;The 3-model BMA was more robust than any single model&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Full-Width Space Trap
&lt;/h3&gt;

&lt;p&gt;Marcel CSVs used full-width spaces (U+3000) in player names while sabermetrics CSVs used half-width spaces. This caused 237 of 463 players to fail matching until I normalized with a &lt;code&gt;player_join&lt;/code&gt; column.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New files&lt;/td&gt;
&lt;td&gt;12 (2 Python + 10 data)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modified files&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines added&lt;/td&gt;
&lt;td&gt;+4,087&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BigQuery tables&lt;/td&gt;
&lt;td&gt;25 → 33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streamlit pages&lt;/td&gt;
&lt;td&gt;7 → 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Foreign players individually projected&lt;/td&gt;
&lt;td&gt;0 → 24&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The system moved from point estimates to probability distributions. "The Giants have a 42.6% chance of winning the pennant" is more useful than "The Giants are projected to win 74 games."&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;Moving experiment code into an app has its own challenges, distinct from the experiments themselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data quality matters more than model quality.&lt;/strong&gt; Incorrect foreign player names/stats would have propagated through the entire pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design for your runtime constraints.&lt;/strong&gt; posteriors.json lets a 4GB RPi5 do Bayesian inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uncertainty visualization needs thought.&lt;/strong&gt; CI bars, fan charts, and probability tables each communicate different aspects of the same distributions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 4 (automated Stan retraining pipeline) remains for next season. But the prediction system now runs Bayesian ensemble predictions end-to-end, from individual players to team championship probabilities.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard&lt;/strong&gt;: &lt;a href="https://npb-prediction.streamlit.app/" rel="noopener noreferrer"&gt;npb-prediction.streamlit.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;github.com/yasumorishima/npb-prediction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://baseball-data.com" rel="noopener noreferrer"&gt;Baseball Data Freak&lt;/a&gt; — NPB player stats&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://npb.jp" rel="noopener noreferrer"&gt;NPB Official&lt;/a&gt; — Official records&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>bayesian</category>
      <category>datascience</category>
    </item>
    <item>
      <title>5 Pitfalls of Grafana + BigQuery — When Your Dashboard Shows Nothing</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Sun, 22 Mar 2026 05:37:11 +0000</pubDate>
      <link>https://forem.com/yasumorishima/5-pitfalls-of-grafana-bigquery-when-your-dashboard-shows-nothing-35nl</link>
      <guid>https://forem.com/yasumorishima/5-pitfalls-of-grafana-bigquery-when-your-dashboard-shows-nothing-35nl</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I built 7 Grafana dashboards (70+ panels) on Grafana Cloud with BigQuery as the data source. Along the way, I hit multiple issues where queries returned data through the API but panels showed nothing in the UI.&lt;/p&gt;

&lt;p&gt;Here are the 5 pitfalls I encountered and how to fix them. Verified on Grafana 13 + BigQuery datasource plugin.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Non-ASCII Column Aliases Need Backticks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Syntax error: Illegal input character&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cause
&lt;/h3&gt;

&lt;p&gt;If you use non-ASCII characters (e.g., Japanese, Chinese) in column aliases, they must be wrapped in backticks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Fails&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;チーム&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HR&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;本塁打&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;-- Works&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`チーム`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HR&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`本塁打`&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This also applies to mixed ASCII + non-ASCII aliases like &lt;code&gt;K率&lt;/code&gt; and references in &lt;code&gt;GROUP BY&lt;/code&gt; / &lt;code&gt;ORDER BY&lt;/code&gt; clauses.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. BigQuery Datasource Doesn't Support &lt;code&gt;format: "time_series"&lt;/code&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;error unmarshaling query JSON to the Query Model: invalid format value: time_series&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix
&lt;/h3&gt;

&lt;p&gt;Always use &lt;code&gt;format: "table"&lt;/code&gt;. For time series data, return a &lt;code&gt;TIMESTAMP&lt;/code&gt; column named &lt;code&gt;time&lt;/code&gt; — Grafana auto-detects it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Historical Data in Timeseries Panels Shows "Data outside time range"
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;Panel displays "Data outside time range" with a "Zoom to data" button.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cause
&lt;/h3&gt;

&lt;p&gt;Timeseries panels filter by the dashboard time range (e.g., "Last 6 hours"). Historical data from 2015–2025 falls outside this range.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;barchart panels&lt;/strong&gt; for historical aggregations. Return the year as a string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Extra fieldConfig Properties Can Break Barchart Rendering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;Barchart panel is completely blank. No error message. Query returns data when tested directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cause
&lt;/h3&gt;

&lt;p&gt;In Grafana 13, adding &lt;code&gt;color&lt;/code&gt;, &lt;code&gt;decimals&lt;/code&gt;, &lt;code&gt;unit&lt;/code&gt;, or &lt;code&gt;custom.axisLabel&lt;/code&gt; to &lt;code&gt;fieldConfig.defaults&lt;/code&gt; can silently prevent barchart rendering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Broken&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;renders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;nothing&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"fieldConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"color"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"fixedColor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#5470c6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fixed"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"decimals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"unit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"none"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Works&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"fieldConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overrides"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start with minimal config, verify it renders, then add properties one at a time.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Panels Inside Expanded Row's &lt;code&gt;panels&lt;/code&gt; Array Are Invisible
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptom
&lt;/h3&gt;

&lt;p&gt;Panels exist in the dashboard JSON but don't appear in the UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cause
&lt;/h3&gt;

&lt;p&gt;Grafana row panels have two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Collapsed (&lt;code&gt;collapsed: true&lt;/code&gt;)&lt;/strong&gt;: child panels stored in the row's &lt;code&gt;panels&lt;/code&gt; array&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expanded (&lt;code&gt;collapsed: false&lt;/code&gt;)&lt;/strong&gt;: child panels must be &lt;strong&gt;top-level siblings&lt;/strong&gt; after the row. The row's &lt;code&gt;panels&lt;/code&gt; array must be empty.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If &lt;code&gt;collapsed: false&lt;/code&gt; but the &lt;code&gt;panels&lt;/code&gt; array still contains panels, those panels are invisible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Broken&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;panels&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;inside&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;expanded&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;row&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;hidden&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"row"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"collapsed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"panels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"barchart"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hidden Panel"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Fixed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;panels&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;top&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;level&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;after&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;row&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"row"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"collapsed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"panels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"barchart"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Visible Panel"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also check &lt;code&gt;gridPos.y&lt;/code&gt; — if a panel's Y position is above its row header, it won't appear in the expected section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Grafana + BigQuery is a powerful combination, but building dashboards via the API exposes issues you'd never encounter through the UI editor. The hardest to debug: "query is correct but panel is blank." Hope this saves you some time.&lt;/p&gt;

</description>
      <category>grafana</category>
      <category>bigquery</category>
      <category>gcp</category>
      <category>datavisualization</category>
    </item>
    <item>
      <title>Moving an NPB Prediction System to BigQuery — BQML and Cloud Run on the Free Tier</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Sun, 22 Mar 2026 00:34:23 +0000</pubDate>
      <link>https://forem.com/yasumorishima/moving-an-npb-prediction-system-to-bigquery-bqml-and-cloud-run-on-the-free-tier-4lb4</link>
      <guid>https://forem.com/yasumorishima/moving-an-npb-prediction-system-to-bigquery-bqml-and-cloud-run-on-the-free-tier-4lb4</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;I've been running an NPB (Japanese professional baseball) player performance prediction project for over a year.&lt;/p&gt;

&lt;p&gt;→ Previous articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasunorim/why-marcel-beat-lightgbm-building-an-npb-player-performance-prediction-system-2ln4"&gt;Why Marcel Beat LightGBM: Building an NPB Player Performance Prediction System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasunorim/annual-auto-retraining-for-npb-baseball-predictions-with-github-actions-30ln"&gt;Annual Auto-Retraining for NPB Baseball Predictions with GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The setup was: GitHub Actions fetches data → trains models → saves CSVs → Streamlit displays results. Data lived in CSVs, the API ran on a Raspberry Pi 5 Docker container, and analysis was done in local Python.&lt;/p&gt;

&lt;p&gt;I added Google BigQuery to centralize the data, run SQL analysis, compare BQML accuracy against Python ML, and deploy the API to Cloud Run. Everything fits within GCP's free tier.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;https://github.com/yasumorishima/npb-prediction&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why BigQuery
&lt;/h2&gt;

&lt;p&gt;Pain points with the CSV-based setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Full re-fetch every run&lt;/strong&gt; — The annual pipeline re-downloads all data from scratch. No incremental updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-analysis was tedious&lt;/strong&gt; — JOINing hitter stats with park factors meant writing pandas merge code every time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wanted SQL access&lt;/strong&gt; — Quick queries like "wRC+ TOP 10" or "age curve peak" required writing Python each time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wanted to try BQML&lt;/strong&gt; — How far can SQL-only ML go compared to Python?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GitHub Actions (Annual Pipeline)
  ├── Data fetch (baseball-data.com / npb.jp)
  ├── Marcel projections
  ├── ML projections (XGBoost / LightGBM)
  ├── load_to_bq.py → BigQuery 25 tables
  ├── bqml_train.py → BQML 4 models
  └── Cloud Run deploy (on master merge)

BigQuery (npb dataset)
  ├── Raw data: 15 tables
  ├── Predictions: 4 tables
  ├── Metrics: 6 tables
  ├── BQML: 4 models
  └── Analysis views: 10

Display layer
  ├── Streamlit Cloud (dashboard)
  ├── Cloud Run API (serverless)
  └── Raspberry Pi 5 API (always-on)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Loading Data to BigQuery
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;load_to_bq.py&lt;/code&gt; loads CSV files into BigQuery.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;RAW_TABLE_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npb_hitters_2015_2025.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_hitters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npb_pitchers_2015_2025.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_pitchers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npb_batting_detailed_2015_2025.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_batting_detailed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npb_sabermetrics_2015_2025.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sabermetrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 25 tables
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;NPB data has column names like &lt;code&gt;K%&lt;/code&gt;, &lt;code&gt;BB%&lt;/code&gt;, &lt;code&gt;HR/9&lt;/code&gt; which BigQuery doesn't accept. The loader sanitizes them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_per_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[^a-zA-Z0-9_]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All tables use &lt;code&gt;WRITE_TRUNCATE&lt;/code&gt; (full replace) on each run, so schema changes are handled automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  BQML: ML with SQL Only
&lt;/h2&gt;

&lt;p&gt;BigQuery ML lets you build features with SQL window functions and train models with &lt;code&gt;CREATE MODEL&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training View (Feature Engineering)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="nv"&gt;`npb.v_batter_train`&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;player&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wOBA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BB_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`npb.raw_hitters`&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;PA&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;lagged&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;player&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;OPS_y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wOBA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;wOBA_y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;OPS_y2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;OPS_delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;age_from_peak&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;POW&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;age_sq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;OPS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;target_ops&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;
  &lt;span class="k"&gt;WINDOW&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;player&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;season&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;lagged&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;OPS_y1&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same lag features, deltas, and age curves I had in Python, reimplemented as SQL window functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Training
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="nv"&gt;`npb.bqml_batter_ops`&lt;/span&gt;
&lt;span class="k"&gt;OPTIONS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'BOOSTED_TREE_REGRESSOR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;input_label_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'target_ops'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="n"&gt;max_iterations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;learn_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;early_stop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;OPS_y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wOBA_y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K_pct_y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BB_pct_y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;age_from_peak&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;age_sq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OPS_delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`npb.v_batter_train`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4 models total:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bqml_batter_ops&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Next-year OPS&lt;/td&gt;
&lt;td&gt;Boosted Tree&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bqml_batter_ops_linear&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Next-year OPS&lt;/td&gt;
&lt;td&gt;Linear Regression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bqml_pitcher_era&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Next-year ERA&lt;/td&gt;
&lt;td&gt;Boosted Tree&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bqml_pitcher_era_linear&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Next-year ERA&lt;/td&gt;
&lt;td&gt;Linear Regression&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  BQML vs Python ML Accuracy
&lt;/h2&gt;

&lt;p&gt;Same data, same evaluation period, MAE comparison.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Batter OPS MAE (lower is better)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MAE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BQML Boosted Tree&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.0642&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python (XGBoost)&lt;/td&gt;
&lt;td&gt;.063&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python (LightGBM)&lt;/td&gt;
&lt;td&gt;.066&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marcel&lt;/td&gt;
&lt;td&gt;.063&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pitcher ERA MAE (lower is better)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MAE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BQML Boosted Tree&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.909&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python (XGBoost)&lt;/td&gt;
&lt;td&gt;.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python (LightGBM)&lt;/td&gt;
&lt;td&gt;.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marcel&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.78&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;BQML performed comparably to Python ML. For pitcher ERA, both fall short of Marcel (0.78) — an ongoing challenge for ML approaches.&lt;/p&gt;

&lt;p&gt;BQML uses more features (park factors, DIPS metrics, Marcel weighted averages), which may contribute to its Boosted Tree performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Analysis Views
&lt;/h2&gt;

&lt;p&gt;10 views for my own analysis use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;View&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_batter_trend&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Player OPS/wOBA trends by season&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_pitcher_trend&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Player ERA/WHIP trends + FIP approximation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_team_pythagorean&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Team win% vs Pythagorean expectation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_sabermetrics_leaders&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;wRC+ leaderboard by season&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_marcel_accuracy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Marcel historical accuracy validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_age_curve&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;NPB-wide age curve (OPS × age)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_park_effects&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Park factor impact analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_data_coverage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Season-by-season data coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;v_data_quality&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-table NULL/missing value summary&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, checking "2025 wRC+ TOP 10" or "age curve peak" now takes SQL instead of writing pandas code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Example query from my environment&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;player&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wRC_plus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wOBA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OPS&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`npb.v_sabermetrics_leaders`&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;season&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2025&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;wrc_rank&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cloud Run Deployment
&lt;/h2&gt;

&lt;p&gt;Deployed the existing FastAPI to Cloud Run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.12-slim&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "${PORT:-8080}"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Merging to master triggers automatic deployment via Artifact Registry.&lt;/p&gt;

&lt;p&gt;The same API runs on both the Raspberry Pi 5 Docker container and Cloud Run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Tier Usage
&lt;/h2&gt;

&lt;p&gt;Everything runs within GCP's free tier.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Free Tier&lt;/th&gt;
&lt;th&gt;Usage&lt;/th&gt;
&lt;th&gt;% Used&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;10 GB/mo&lt;/td&gt;
&lt;td&gt;~5 MB&lt;/td&gt;
&lt;td&gt;0.05%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries&lt;/td&gt;
&lt;td&gt;1 TB/mo&lt;/td&gt;
&lt;td&gt;~22 GB&lt;/td&gt;
&lt;td&gt;2.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Run&lt;/td&gt;
&lt;td&gt;2M requests/mo&lt;/td&gt;
&lt;td&gt;minimal&lt;/td&gt;
&lt;td&gt;≈0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Daily BigQuery usage monitoring with projected month-end pace is sent to Discord.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Actions Pipeline
&lt;/h2&gt;

&lt;p&gt;The annual pipeline (&lt;code&gt;annual_update.yml&lt;/code&gt;) now includes BigQuery loading, BQML training, and Cloud Run deployment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: fetch_npb_data.py       → Scrape hitter/pitcher stats
Step 2: fetch_npb_detailed.py   → Detailed batting stats (for wOBA)
Step 3: pythagorean.py          → Standings + Pythagorean win%
Step 4: sabermetrics.py         → wOBA/wRC+/wRAA calculation
Step 5: marcel_projection.py    → Marcel projections
Step 6: ml_projection.py        → ML projections + model save
Step 7: git commit &amp;amp; push       → Auto-commit data/
Step 8: load_to_bq.py           → Load all data to BigQuery  ← NEW
Step 9: bqml_train.py           → BQML train &amp;amp; evaluate      ← NEW
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BQML steps use &lt;code&gt;continue-on-error: true&lt;/code&gt;, so BigQuery issues don't break the Python ML pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;BQML accuracy was comparable to Python. Writing features as SQL window functions takes getting used to, but views make them reusable&lt;/li&gt;
&lt;li&gt;Analysis views are quietly useful. SQL replaces pandas for routine queries&lt;/li&gt;
&lt;li&gt;At ~40,000 rows, free tier usage is negligible&lt;/li&gt;
&lt;li&gt;Having the API on both Cloud Run and RPi5 means one can go down without losing service&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Articles
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasunorim/why-marcel-beat-lightgbm-building-an-npb-player-performance-prediction-system-2ln4"&gt;Why Marcel Beat LightGBM: Building an NPB Player Performance Prediction System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasunorim/annual-auto-retraining-for-npb-baseball-predictions-with-github-actions-30ln"&gt;Annual Auto-Retraining for NPB Baseball Predictions with GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>bigquery</category>
      <category>gcp</category>
      <category>python</category>
      <category>baseball</category>
    </item>
    <item>
      <title>Monitoring the Strait of Hormuz Blockade with Open AIS Data and a Raspberry Pi</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Sun, 15 Mar 2026 23:57:21 +0000</pubDate>
      <link>https://forem.com/yasumorishima/monitoring-the-strait-of-hormuz-blockade-with-open-ais-data-and-a-raspberry-pi-45jp</link>
      <guid>https://forem.com/yasumorishima/monitoring-the-strait-of-hormuz-blockade-with-open-ais-data-and-a-raspberry-pi-45jp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Data scope disclaimer&lt;/strong&gt;: All data in this article comes from &lt;a href="https://aisstream.io/" rel="noopener noreferrer"&gt;aisstream.io&lt;/a&gt;'s &lt;strong&gt;terrestrial AIS receivers&lt;/strong&gt;. Coverage in open water (mid-strait) is limited; satellite AIS would provide a more complete picture. All figures are from &lt;strong&gt;mid-March 2026&lt;/strong&gt; and the situation is evolving daily.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What This Is
&lt;/h2&gt;

&lt;p&gt;In March 2026, shipping through the Strait of Hormuz — through which roughly 20% of the world's oil passes — was reported to be severely restricted. I built a monitoring system to observe this using free AIS (Automatic Identification System) data and a Raspberry Pi 5.&lt;/p&gt;

&lt;p&gt;This post covers the system architecture, the analytics pipeline, and what the data shows within the limitations of terrestrial AIS coverage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repository&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/hormuz-ship-tracker" rel="noopener noreferrer"&gt;yasumorishima/hormuz-ship-tracker&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb50c5uerjb0do6z2goxb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb50c5uerjb0do6z2goxb.png" alt="Persian Gulf vessel distribution (mid-March 2026) — traffic concentrated around UAE coast, strait center nearly empty"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Auto-generated snapshot (every 6 hours). Shows gate line positions, transit IN/OUT stats, and vessel type distribution. Note the concentration around UAE ports and the near-empty strait center.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  AIS Data
&lt;/h2&gt;

&lt;p&gt;AIS is a maritime safety system where vessels automatically broadcast their position, speed, course, name, and type over VHF radio. It's mandatory for international vessels over 300 gross tonnage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aisstream.io/" rel="noopener noreferrer"&gt;aisstream.io&lt;/a&gt; aggregates terrestrial AIS receiver data worldwide and streams it via a free WebSocket API. This is the data source for this project.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aisstream.io (WebSocket)
  → Collector (AIS receiver + land filter + SQLite)
  → Analytics Engine (gate-line transit detection + vessel classification)
  → FastAPI + Leaflet.js + Chart.js (dashboard)
  → matplotlib (6-hourly snapshot → GitHub auto-push)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Two Docker containers run 24/7 on a Raspberry Pi 5: the main collector/API and a snapshot cron job.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the Data Shows
&lt;/h2&gt;
&lt;h3&gt;
  
  
  67% Anchored Ratio (mid-March 2026)
&lt;/h3&gt;

&lt;p&gt;Of ~290 monitored vessels, about 67% were stationary (speed &amp;lt; 0.5 knots). In a typical port area, this ratio is usually around 30–40%. The elevated value is notable.&lt;/p&gt;
&lt;h3&gt;
  
  
  35 Vessels Waiting 6+ Hours (mid-March 2026)
&lt;/h3&gt;

&lt;p&gt;Vessels that haven't moved for over 6 hours are counted as the "waiting fleet." About 35 vessels met this criterion, with 11 stuck for over 24 hours.&lt;/p&gt;

&lt;p&gt;Waiting fleet flags (estimated from MMSI MID):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Flag&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Panama&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marshall Islands&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UAE&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kuwait&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Others&lt;/td&gt;
&lt;td&gt;1 each&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Panama and Marshall Islands are open registries — commonly used by large commercial ships and tankers. Seven tankers were among the waiting fleet.&lt;/p&gt;
&lt;h3&gt;
  
  
  Near-Zero Strait Transits on Terrestrial AIS (mid-March 2026)
&lt;/h3&gt;

&lt;p&gt;A virtual gate line across the narrowest point of the Strait of Hormuz detects vessel crossings automatically. Only 1 transit was detected in 24 hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important caveat&lt;/strong&gt;: this only reflects what aisstream.io's &lt;strong&gt;terrestrial AIS receivers&lt;/strong&gt; can capture. Coverage in mid-strait open water is limited. News reports indicate some vessels (Turkish, Indian, Saudi-flagged) have been allowed limited passage — these may not appear in terrestrial AIS data. &lt;strong&gt;"No data" does not equal "no ships."&lt;/strong&gt; This caveat applies to all figures in this article.&lt;/p&gt;
&lt;h3&gt;
  
  
  Traffic Concentrated Around UAE Coast (mid-March 2026)
&lt;/h3&gt;

&lt;p&gt;Most data clusters around Dubai, Jebel Ali, and Fujairah. Three gate lines capture port approach traffic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gate&lt;/th&gt;
&lt;th&gt;Inbound&lt;/th&gt;
&lt;th&gt;Outbound&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dubai / Jebel Ali Approach&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fujairah Approach&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strait of Hormuz&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dubai inbound significantly exceeds outbound. Fujairah shows only outbound traffic — likely vessels departing after bunkering (refueling).&lt;/p&gt;
&lt;h2&gt;
  
  
  Technical Implementation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Gate-Line Transit Detection
&lt;/h3&gt;

&lt;p&gt;Virtual gate lines (line segments) are defined at the strait and port approaches. For each vessel, consecutive position reports are checked for intersection with each gate using computational geometry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;segments_intersect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;d1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;d2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;d3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;d4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; \
       &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;d4&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;d4&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Direction (INBOUND/OUTBOUND) is determined by the sign of the cross product relative to the gate vector. Same-vessel crossings within 6 hours are deduplicated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data-Driven Situation Assessment
&lt;/h3&gt;

&lt;p&gt;All dashboard text is auto-generated from data patterns. The system classifies the situation level based on strait transits, anchored ratio, and waiting fleet size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;strait_transits&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;anchored_pct&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Strait Transit Suspended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;strait_transits&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;elevated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Limited Strait Transit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Monitoring Active&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When conditions normalize, the UI automatically shifts to normal mode — no hardcoded crisis messaging.&lt;/p&gt;

&lt;h3&gt;
  
  
  MMSI → Flag Mapping
&lt;/h3&gt;

&lt;p&gt;Since aisstream.io's metadata doesn't reliably include country codes, flags are derived from the first 3 digits of the 9-digit MMSI number (Maritime Identification Digits). The system maps 100+ MIDs to countries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Destination Normalization
&lt;/h3&gt;

&lt;p&gt;AIS destination fields are free-text and wildly inconsistent (DUBAI, AE DXB, AEDXB, DMC DUBAI, etc.). Over 40 variants are mapped to canonical port names.&lt;/p&gt;

&lt;h2&gt;
  
  
  4-Day Data Analysis Update (March 18)
&lt;/h2&gt;

&lt;p&gt;After 4 days of continuous collection (43,000+ position records, 384 unique vessels), several new insights emerged.&lt;/p&gt;

&lt;h3&gt;
  
  
  Traffic Density Heatmap
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vjes67l2xoum38yc7gr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vjes67l2xoum38yc7gr.png" alt="Traffic Density Heatmap"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Left: Full Gulf hexbin density. Right: Zoomed strait with AIS dead zone. Bottom: Port area, flag state, and vessel type breakdowns.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Clean positions&lt;/td&gt;
&lt;td&gt;36,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anomalous (filtered)&lt;/td&gt;
&lt;td&gt;7,300 (17%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unique vessels&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strait crossings confirmed&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dubai / Jebel Ali gate crossings&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Timelapse — 24 Hours of Vessel Movement
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3ro914woqzxiop2dfjy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3ro914woqzxiop2dfjy.gif" alt="Vessel Movement Timelapse"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;24-hour vessel movement animation. Positions are linearly interpolated between data points, with land-crossing prevention.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AIS Data Quality: What the Anomalies Actually Are
&lt;/h3&gt;

&lt;p&gt;About 17% of positions contained anomalous data. Two distinct patterns were identified:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Anomaly&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speed = 102.3 kn&lt;/td&gt;
&lt;td&gt;~3,200&lt;/td&gt;
&lt;td&gt;AIS protocol "not available" sentinel (10-bit 0x3FF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed 40–99 kn&lt;/td&gt;
&lt;td&gt;~4,100&lt;/td&gt;
&lt;td&gt;Coastal receiver decode errors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ~48 kn cluster was particularly interesting: on 2026-03-16 at 07:00 UTC, 4 vessels simultaneously appeared at the same coordinates in the strait with identical speeds. This was a single receiver malfunction — no ships were actually there. These anomalies had produced 41 false transit detections, which were eliminated by filtering positions with speed &amp;gt;= 40 kn.&lt;/p&gt;

&lt;p&gt;The dashboard now shows anomalous vessels with red dashed markers and a "DATA QUALITY WARNING" popup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Browser-Based Replay
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;/replay&lt;/code&gt; endpoint provides a Leaflet.js animated replay with play/pause, speed control (0.25x–16x), timeline scrubbing, and keyboard shortcuts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terrestrial AIS coverage&lt;/strong&gt;: Free aisstream.io data comes from shore-based receivers. Open-water coverage (mid-strait) is limited&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AIS speed 102.3 knots&lt;/strong&gt;: The "not available" sentinel value (0x3FF). Must be filtered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed 40–99 kn receiver glitches&lt;/strong&gt;: Coastal receiver decode errors produce phantom positions. Transit detection filters speed &amp;gt;= 40 kn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collection period&lt;/strong&gt;: Ongoing collection. Longer-term trend analysis requires further accumulation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Using aisstream.io's free API and a Raspberry Pi 5, this system continuously collects and analyzes vessel traffic across the entire Persian Gulf. After 4 days, 43,000+ positions have been collected, with heatmap visualization, timelapse animation, and data quality analysis fully implemented.&lt;/p&gt;

&lt;p&gt;Statistics are auto-updated every 6 hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/yasumorishima/hormuz-ship-tracker/blob/master/docs/STATS.md" rel="noopener noreferrer"&gt;Live Statistics (auto-updated)&lt;/a&gt;&lt;/strong&gt; / &lt;strong&gt;&lt;a href="https://github.com/yasumorishima/hormuz-ship-tracker" rel="noopener noreferrer"&gt;Repository&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data source: &lt;a href="https://aisstream.io/" rel="noopener noreferrer"&gt;aisstream.io&lt;/a&gt; / Land polygons: &lt;a href="https://www.naturalearthdata.com/" rel="noopener noreferrer"&gt;Natural Earth&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>raspberrypi</category>
      <category>docker</category>
      <category>maritime</category>
    </item>
    <item>
      <title>I Built a WBC Quarterfinal Scouting App with MLB Statcast Data</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Fri, 13 Mar 2026 16:58:40 +0000</pubDate>
      <link>https://forem.com/yasumorishima/i-built-a-wbc-quarterfinal-scouting-app-with-mlb-statcast-data-2k49</link>
      <guid>https://forem.com/yasumorishima/i-built-a-wbc-quarterfinal-scouting-app-with-mlb-statcast-data-2k49</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;A Streamlit scouting dashboard for the WBC 2026 Quarterfinal: Japan vs Venezuela.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;App&lt;/strong&gt;: &lt;a href="https://wbc-qf-jpn-ven.streamlit.app/" rel="noopener noreferrer"&gt;https://wbc-qf-jpn-ven.streamlit.app/&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/wbc-scouting" rel="noopener noreferrer"&gt;https://github.com/yasumorishima/wbc-scouting&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the pool round, I built 30 team-level dashboards (20 teams). But quarterfinals are head-to-head matchups — you want to know "which pitch type is effective against this batter?" and "which zone has the highest opponent BA against this pitcher?" in one place.&lt;/p&gt;
&lt;h2&gt;
  
  
  5-Tab Structure
&lt;/h2&gt;
&lt;h3&gt;
  
  
  🎯 Tab 1: Matchup Preview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vdm9hs313p3q42cq5d5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vdm9hs313p3q42cq5d5.png" alt="Predicted Lineup Table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Venezuela's predicted starting lineup (9 batters) table, an alert for Machado (NPB player, no Statcast data), and a bench/pinch-hit candidates table.&lt;/p&gt;

&lt;p&gt;Each batter expands into a full scouting report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 key metrics (AVG/OBP/SLG/OPS/K%/BB%) with MLB average comparison&lt;/li&gt;
&lt;li&gt;Radar chart (5-axis, MLB average line overlay)&lt;/li&gt;
&lt;li&gt;Zone heatmaps (3x3, 5x5) — BA and xwOBA by zone, split by vs LHP/RHP&lt;/li&gt;
&lt;li&gt;Spray charts — split by vs LHP/RHP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb37i7l3zkyz08nyufm40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb37i7l3zkyz08nyufm40.png" alt="Spray Charts (vs LHP / vs RHP)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Platoon splits (OPS/AVG/K%/BB% side by side)&lt;/li&gt;
&lt;li&gt;Pitching plan — overall + vs LHP + vs RHP. Auto-generated from pitch type whiff rates, zone-level BA, count-split OPS, and platoon data&lt;/li&gt;
&lt;li&gt;Defensive positioning — auto-generated from spray angle, ground ball rate, and exit velocity, split by pitcher handedness&lt;/li&gt;
&lt;li&gt;Pitch type performance table (BA, SLG, Whiff%, Chase%)&lt;/li&gt;
&lt;li&gt;Count-based performance (color-coded: green=hitter ahead, red=behind, amber=even)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the bottom, there's a full analysis section for the starting pitcher (Ranger Suárez, LHP) with hitting approach (as LHB/RHB), arsenal table, movement chart, location heatmaps, platoon splits, and pitch selection by count — all in collapsible expanders.&lt;/p&gt;
&lt;h3&gt;
  
  
  📋 Tab 2: Game Plan
&lt;/h3&gt;

&lt;p&gt;Statcast data organized by game phase:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pj7v6hvxee1uzuxfv26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pj7v6hvxee1uzuxfv26.png" alt="Team Weakness Analysis"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Team weakness detection&lt;/strong&gt; — batters with K% ≥ 22.4% (MLB avg), BB% &amp;lt; 8.3%, or platoon OPS gap ≥ 80 pts, auto-extracted with player names and values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Innings 1-3 vs Suárez (starter)&lt;/strong&gt; — Batting: SP's K%/BB%/Whiff%/velocity and pitch mix. Pitching: per-batter AVG/K%/BB% grouped by lineup position (#1-3, #4-6, #7-9)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Innings 4-5 (2nd time through or bullpen transition)&lt;/strong&gt; — Batting: bridge reliever stats. Pitching: MLB league-wide trend (opp OPS rises 15-20% on 2nd time through) plus batter classification by K% and BB%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Innings 6+ (high-leverage)&lt;/strong&gt; — Batting: closer/setup K%/Whiff%/Chase%/velocity with pitcher type classification. Pitching: platoon matchup data for batters with significant splits, full per-batter stat line&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pinch-hit candidates&lt;/strong&gt; — bench player AVG/OPS/K%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every piece of text is driven by MLB Statcast numbers only. No coaching instructions — just data.&lt;/p&gt;
&lt;h3&gt;
  
  
  ⚔️ Tab 3: Lineup Scouting
&lt;/h3&gt;

&lt;p&gt;Team batting radar chart at the top (AVG/OBP/SLG/K%/BB%, 5-axis, MLB average line overlay). Below that, a full roster table and a dropdown selector for individual player analysis (metrics, scouting summary, pitching plan, defensive positioning, radar chart, zone heatmaps, spray charts, etc.).&lt;/p&gt;
&lt;h3&gt;
  
  
  🎱 Tab 4: Starting Pitcher Analysis
&lt;/h3&gt;

&lt;p&gt;Ranger Suárez's pitching data. Metric cards (avg velocity, avg spin, whiff%, chase%, put away%, opp avg, etc.) and scouting summary, plus collapsible expanders for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hitting approach (as LHB / as RHB)&lt;/li&gt;
&lt;li&gt;Arsenal table (velocity mph/km/h, break, whiff%, put away%) + movement chart&lt;/li&gt;
&lt;li&gt;Pitch location heatmap + platoon splits&lt;/li&gt;
&lt;li&gt;Pitch selection by count (donut charts) + count-based performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ew8r4dl1ux5glvlyou8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ew8r4dl1ux5glvlyou8.png" alt="Pitch Selection by Count"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  🔥 Tab 5: Bullpen Scouting
&lt;/h3&gt;

&lt;p&gt;Bullpen overview (all relievers' ERA, K%, velocity in one info box), then a dropdown selector for individual reliever analysis. Same structure as Tab 4 (metric cards, scouting summary, hitting approach, arsenal, heatmaps, count analysis).&lt;/p&gt;
&lt;h2&gt;
  
  
  Technical Highlights
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Dynamic text generation from raw Statcast data
&lt;/h3&gt;

&lt;p&gt;Six generator functions compute per-player analysis from pitch-by-pitch data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;generate_player_summary()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Batter scouting summary (strengths/weaknesses)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;generate_pitcher_summary()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pitcher scouting summary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;generate_pitching_plan()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;How to pitch to a batter (pitch types, zones, counts, platoon)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;generate_hitting_plan()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;How to hit a pitcher (hittable pitches, zones, counts)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;generate_defensive_positioning()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Infield/outfield shift recommendation from spray data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;generate_sp_pitch_analysis()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Starting pitcher's pitch-by-pitch analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each function calculates stats from raw Statcast data and outputs only items that cross statistical thresholds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: identify the pitch type with highest opponent BA
&lt;/span&gt;&lt;span class="n"&gt;hittable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pt_stats&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ba&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ba&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;hittable&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;hittable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ba&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.250&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hittable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- **Highest opp BA pitch:** &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; (BA .&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ba&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;03&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  MLB average as baseline for every stat
&lt;/h3&gt;

&lt;p&gt;A raw number like "SLG .476" is meaningless without context. Every stat shows the MLB average alongside it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;K% 28.3% (MLB avg 22.4%)
BB% 6.1% (MLB avg 8.3%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Handedness-aware zone names
&lt;/h3&gt;

&lt;p&gt;"Inside" and "outside" flip depending on batter handedness. &lt;code&gt;_zone_names_for_bats()&lt;/code&gt; automatically adjusts zone labels so "inside high" is always correct relative to the batter's stance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Glossary built into every section
&lt;/h3&gt;

&lt;p&gt;Every stat has a &lt;strong&gt;?&lt;/strong&gt; tooltip (Streamlit's &lt;code&gt;help&lt;/code&gt; parameter) showing its definition and MLB average. Count displays include a reading guide ("Balls-Strikes" format) with color legend (🟢 hitter ahead, 🔴 hitter behind, 🟡 even).&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Source
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://baseballsavant.mlb.com/" rel="noopener noreferrer"&gt;Baseball Savant&lt;/a&gt; Statcast data (2024-2025 MLB regular season)&lt;/li&gt;
&lt;li&gt;Retrieved via &lt;a href="https://github.com/jldbc/pybaseball" rel="noopener noreferrer"&gt;pybaseball&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/shogaku/i-built-a-wbc-2026-scouting-dashboard-with-mlb-statcast-data-3k3j"&gt;I Built a WBC 2026 Scouting Dashboard with MLB Statcast Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>streamlit</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Cross-Repo README Sync with GitHub Actions — Push vs Pull Pattern</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Tue, 10 Mar 2026 10:57:17 +0000</pubDate>
      <link>https://forem.com/yasumorishima/cross-repo-readme-sync-with-github-actions-push-vs-pull-pattern-2cm3</link>
      <guid>https://forem.com/yasumorishima/cross-repo-readme-sync-with-github-actions-push-vs-pull-pattern-2cm3</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When you manage multiple GitHub repositories, you often want to display stats from one repo in another — for example, showing contribution counts in your profile README.&lt;/p&gt;

&lt;p&gt;Manually updating these numbers is error-prone. Lists get out of sync, numbers become stale, and you forget to update after changes.&lt;/p&gt;

&lt;p&gt;This article covers how to build &lt;strong&gt;cross-repo README sync&lt;/strong&gt; with GitHub Actions, and a key architectural decision that saves you from permission headaches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Approaches: Push vs Pull
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Push: Source repo writes to target
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source repo → (PAT) → Update target repo's README
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Requires a Personal Access Token (PAT)&lt;/li&gt;
&lt;li&gt;Fine-grained PATs can unexpectedly return 403 even with correct permissions&lt;/li&gt;
&lt;li&gt;PAT management overhead (rotation, scope, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pull: Target repo reads from source
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Target repo → (GITHUB_TOKEN) → Read source repo's README via API
            → (GITHUB_TOKEN) → Update own README
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;No PAT needed — &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; always has write access to its own repo&lt;/li&gt;
&lt;li&gt;Public repo data is readable without any token&lt;/li&gt;
&lt;li&gt;Just add a workflow to the target repo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict: Pull wins.&lt;/strong&gt; It eliminates PAT management entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. HTML Comment Markers
&lt;/h3&gt;

&lt;p&gt;Mark the auto-updated sections in your target README:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Stats&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- STATS_START --&amp;gt;&lt;/span&gt;(10 PRs / 5 Merged)&lt;span class="c"&gt;&amp;lt;!-- STATS_END --&amp;gt;&lt;/span&gt; across 3 repositories.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only the content between markers gets replaced — everything else stays untouched.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Python Sync Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="n"&gt;README&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;README.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_source_readme&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch README via GitHub API (no token needed for public repos).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repos/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/contents/README.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--jq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;replace_marker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Replace content between HTML comment markers.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;rf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&amp;lt;!-- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_START --&amp;gt;).*?(&amp;lt;!-- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_END --&amp;gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;rf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\1&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;\2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract stats from a markdown summary table.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\| \*\*Total\*\* \|.*?\| \*\*(\d+)\*\* \| \*\*(\d+)\*\*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;source_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;merged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_source_readme&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-org&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-source-repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to fetch source README&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to parse stats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;readme&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;README&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;readme&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;replace_marker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;readme&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STATS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; PRs / &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;merged&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; Merged)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;README&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;readme&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Updated: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; PRs / &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;merged&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; Merged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Workflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Sync README Stats&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Run after the source repo's update schedule&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;30&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;sync&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.12'&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Sync stats from source repo&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;GH_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.token }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python scripts/sync_stats.py&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Commit and push if changed&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;git config user.name "github-actions[bot]"&lt;/span&gt;
          &lt;span class="s"&gt;git config user.email "github-actions[bot]@users.noreply.github.com"&lt;/span&gt;
          &lt;span class="s"&gt;git add README.md&lt;/span&gt;
          &lt;span class="s"&gt;if ! git diff --cached --quiet; then&lt;/span&gt;
            &lt;span class="s"&gt;git commit -m "docs: sync stats $(date -u +%Y-%m-%d)"&lt;/span&gt;
            &lt;span class="s"&gt;git push&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  PAT 403 Errors
&lt;/h3&gt;

&lt;p&gt;With the push approach, Fine-grained PATs can return 403 even when configured with "All repositories" and "Contents: Read and write":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;remote: Permission to user/repo.git denied to user.
fatal: unable to access &lt;span class="s1"&gt;'...'&lt;/span&gt;: The requested URL returned error: 403
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The GitHub Contents API (&lt;code&gt;-X PUT&lt;/code&gt;) also returns 403. Rather than debugging token permissions, switching to the pull approach is the most reliable fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cron Timing
&lt;/h3&gt;

&lt;p&gt;If your source repo updates at 09:00 UTC on Mondays, schedule the sync workflow for &lt;strong&gt;09:30 or later&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad: same time as source → may fetch stale data&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;

&lt;span class="c1"&gt;# Good: after source update completes&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;30&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Marker Design
&lt;/h3&gt;

&lt;p&gt;Use unique marker names per section to avoid collisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- PROJECT_STATS_START --&amp;gt;&lt;/span&gt;...&lt;span class="c"&gt;&amp;lt;!-- PROJECT_STATS_END --&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!-- BADGE_COUNT_START --&amp;gt;&lt;/span&gt;...&lt;span class="c"&gt;&amp;lt;!-- BADGE_COUNT_END --&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;replace_marker&lt;/code&gt; function only touches content between markers, so the rest of your README is safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Principle&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use pull, not push&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Place the workflow in the target repo, use GITHUB_TOKEN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HTML comment markers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Isolate auto-updated sections from manual content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stagger cron schedules&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Run sync after the source has finished updating&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Single Source of Truth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One canonical data source, everything else pulls from it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This pattern works for any cross-repo data sync — contribution stats, package versions, badge counts, or anything else you want to keep consistent across repositories.&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>github</category>
      <category>python</category>
      <category>automation</category>
    </item>
    <item>
      <title>Optimizing Marcel Projection Weights for NPB — Grid Search + Bootstrap Validation</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Sat, 07 Mar 2026 23:43:19 +0000</pubDate>
      <link>https://forem.com/yasumorishima/optimizing-marcel-projection-weights-for-npb-grid-search-bootstrap-validation-3pkm</link>
      <guid>https://forem.com/yasumorishima/optimizing-marcel-projection-weights-for-npb-grid-search-bootstrap-validation-3pkm</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.tangotiger.net/marcel/" rel="noopener noreferrer"&gt;Marcel projection system&lt;/a&gt; is a simple but effective player performance forecasting method created by Tom Tango. It uses a weighted average of the last 3 seasons plus regression to the mean.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-marcel-weight-study" rel="noopener noreferrer"&gt;https://github.com/yasumorishima/npb-marcel-weight-study&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I've been using these default parameters in &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;npb-prediction&lt;/a&gt; (&lt;a href="https://dev.to/yasumorishima/npb-prediction-marcel-vs-ml"&gt;blog post&lt;/a&gt;), but they were originally calibrated for &lt;strong&gt;MLB data&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Original (Tango's values)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;w0 / w1 / w2&lt;/td&gt;
&lt;td&gt;Weights for N-1 / N-2 / N-3 seasons&lt;/td&gt;
&lt;td&gt;5 / 4 / 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REG_PA&lt;/td&gt;
&lt;td&gt;Regression strength (hitters)&lt;/td&gt;
&lt;td&gt;1200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REG_IP&lt;/td&gt;
&lt;td&gt;Regression strength (pitchers)&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Are these optimal for NPB (Nippon Professional Baseball)? I ran a comprehensive grid search to find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Study Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Grid Search
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Search Space&lt;/th&gt;
&lt;th&gt;Combinations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hitters&lt;/td&gt;
&lt;td&gt;w0(3-8) × w1(1-5) × w2(1-4) × REG_PA(6 values)&lt;/td&gt;
&lt;td&gt;720&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pitchers&lt;/td&gt;
&lt;td&gt;w0(3-8) × w1(1-5) × w2(1-4) × REG_IP(5 values)&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Evaluation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cross-validation: 2019–2025 (7 years)&lt;/li&gt;
&lt;li&gt;Two scenarios: &lt;strong&gt;with 2020&lt;/strong&gt; (COVID-shortened season) / &lt;strong&gt;without 2020&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Metric: MAE (Mean Absolute Error)&lt;/li&gt;
&lt;li&gt;Data: 3,780 hitter rows / 3,773 pitcher rows (2015–2025)&lt;/li&gt;
&lt;li&gt;Runtime: ~4.5 hours on GitHub Actions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results: Hitters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OPS MAE — Top 5 (with 2020)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Weights&lt;/th&gt;
&lt;th&gt;REG_PA&lt;/th&gt;
&lt;th&gt;OPS MAE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;8/4/3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.06142&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7/3/3&lt;/td&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;.06142&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7/5/1&lt;/td&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;.06143&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8/5/1&lt;/td&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;.06145&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4/3/1&lt;/td&gt;
&lt;td&gt;1200&lt;/td&gt;
&lt;td&gt;.06146&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Previous (5/4/3, REG_PA=1200): &lt;strong&gt;.06227 — ranked 224th out of 720&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Improvement: .06227 → .06142 = &lt;strong&gt;1.37% MAE reduction&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimal Weights Differ by Metric
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Best Weights&lt;/th&gt;
&lt;th&gt;REG_PA&lt;/th&gt;
&lt;th&gt;MAE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AVG&lt;/td&gt;
&lt;td&gt;3/2/4&lt;/td&gt;
&lt;td&gt;1500&lt;/td&gt;
&lt;td&gt;.02160&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OBP&lt;/td&gt;
&lt;td&gt;7/3/3&lt;/td&gt;
&lt;td&gt;1500&lt;/td&gt;
&lt;td&gt;.02449&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SLG&lt;/td&gt;
&lt;td&gt;4/3/1&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;.04200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OPS&lt;/td&gt;
&lt;td&gt;8/4/3&lt;/td&gt;
&lt;td&gt;2000&lt;/td&gt;
&lt;td&gt;.06142&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AVG favors the N-3 season (stability), while SLG minimizes it (recency). The optimal parameters align with each metric's characteristics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results: Pitchers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ERA MAE — Top 5 (with 2020)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Weights&lt;/th&gt;
&lt;th&gt;REG_IP&lt;/th&gt;
&lt;th&gt;ERA MAE&lt;/th&gt;
&lt;th&gt;WHIP MAE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4/5/2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.68171&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;.13065&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/4/1&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;.68172&lt;/td&gt;
&lt;td&gt;.13103&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/4/1&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;td&gt;.68228&lt;/td&gt;
&lt;td&gt;.13068&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/4/2&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;.68304&lt;/td&gt;
&lt;td&gt;.13099&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/3/2&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;.68312&lt;/td&gt;
&lt;td&gt;.13118&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Previous (5/4/3, REG_IP=600): &lt;strong&gt;.69105 — ranked 75th out of 600&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Improvement over previous: 1.35% (with 2020) / 1.53% (without 2020)&lt;/p&gt;

&lt;h2&gt;
  
  
  Bootstrap Validation
&lt;/h2&gt;

&lt;p&gt;300 bootstrap resamples to test if the improvement is statistically significant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hitter OPS (optimal 8/4/3 reg=2000 vs previous 5/4/3 reg=1200):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Statistic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mean improvement&lt;/td&gt;
&lt;td&gt;0.00084&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;95% CI&lt;/td&gt;
&lt;td&gt;[0.00022, 0.00147]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;best &amp;gt; default&lt;/td&gt;
&lt;td&gt;99.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p-value&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.003&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The lower bound of the 95% CI is above zero — &lt;strong&gt;statistically significant&lt;/strong&gt; (p &amp;lt; 0.01).&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Findings: NPB vs MLB
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hitters: Strong N-1 Bias + Stronger Regression
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Previous&lt;/th&gt;
&lt;th&gt;NPB Optimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;N-1 (most recent) weight&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;N-3 weight&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1–3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression (REG_PA)&lt;/td&gt;
&lt;td&gt;1200&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The simultaneous increase in both w0 and REG_PA seems contradictory but is actually coherent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;w0=8&lt;/strong&gt;: Emphasize the N-1 season in the weighted average&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REG_PA=2000&lt;/strong&gt;: Pull extreme performances back to the mean more aggressively&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In NPB data, this "trust trends but don't trust extremes" combination proved optimal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitchers: N-2 Season is Most Predictive
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Previous&lt;/th&gt;
&lt;th&gt;NPB Optimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;N-1 (most recent) weight&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3–4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;N-2 weight&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4–5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;N-3 weight&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1–2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression (REG_IP)&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;800&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most striking finding: &lt;strong&gt;w1 (N-2 season) is larger than w0 (N-1 season)&lt;/strong&gt;. This contradicts the conventional assumption that the most recent season is always most important.&lt;/p&gt;

&lt;p&gt;Incorporating the N-2 season helps smooth out temporary fluctuations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Parameters
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Weights&lt;/th&gt;
&lt;th&gt;Regression&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hitters&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8/4/3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;REG_PA=&lt;strong&gt;2000&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Bootstrap p=0.003&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pitchers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4/5/2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;REG_IP=&lt;strong&gt;800&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Optimal for both ERA and WHIP across scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These parameters will be applied to &lt;a href="https://github.com/yasumorishima/npb-prediction" rel="noopener noreferrer"&gt;npb-prediction&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;

&lt;p&gt;Code and all result CSVs are available at &lt;a href="https://github.com/yasumorishima/npb-marcel-weight-study" rel="noopener noreferrer"&gt;npb-marcel-weight-study&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The conventional Marcel weights (5/4/3) are &lt;strong&gt;not optimal for NPB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Hitters: strong N-1 weight (w0=8) + stronger regression (REG_PA=2000)&lt;/li&gt;
&lt;li&gt;Pitchers: N-2 season is more predictive than N-1 (most recent)&lt;/li&gt;
&lt;li&gt;Bootstrap test confirms significance at &lt;strong&gt;p=0.003&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Marcel is simple, but there's room for improvement when you calibrate parameters to your league.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Data sources&lt;/strong&gt;: &lt;a href="https://baseball-data.com" rel="noopener noreferrer"&gt;baseball-data.com&lt;/a&gt; / &lt;a href="https://npb.jp" rel="noopener noreferrer"&gt;npb.jp&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-marcel-weight-study" rel="noopener noreferrer"&gt;https://github.com/yasumorishima/npb-marcel-weight-study&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>statistics</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Can Statcast Data Improve MLB Player Performance Predictions? — Beating Marcel with LightGBM</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Fri, 06 Mar 2026 22:53:28 +0000</pubDate>
      <link>https://forem.com/yasumorishima/can-statcast-data-improve-mlb-player-performance-predictions-beating-marcel-with-lightgbm-1lb5</link>
      <guid>https://forem.com/yasumorishima/can-statcast-data-improve-mlb-player-performance-predictions-beating-marcel-with-lightgbm-1lb5</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This article is a continuation of my NPB Bayesian prediction series. Along the way, I reached a conclusion:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Without tracking data like Statcast, we can't break through the next wall."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In my NPB project, I added Bayesian regression (Stan/Ridge) on top of Marcel projections. At the player level there was consistent improvement (p=0.06), but at the team level the gains disappeared. The reason: Marcel's 3-year weighted average is already accurate for high-PA regulars, leaving no margin for improvement using only aggregate stats like K%/BB%/BABIP.&lt;/p&gt;

&lt;p&gt;MLB has &lt;strong&gt;Statcast&lt;/strong&gt;. This article tests whether Statcast tracking features can beat Marcel.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/baseball-mlops" rel="noopener noreferrer"&gt;https://github.com/yasumorishima/baseball-mlops&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Streamlit&lt;/strong&gt;: &lt;a href="https://baseball-mlops.streamlit.app/" rel="noopener noreferrer"&gt;https://baseball-mlops.streamlit.app/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is Marcel?
&lt;/h2&gt;

&lt;p&gt;Marcel is a simple projection system from the 1980s: weighted average of the past 3 years (weights 5:4:3) + regression to the mean + age adjustment. Despite its simplicity, it's remarkably accurate — especially for regular players with large sample sizes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data &amp;amp; Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source&lt;/strong&gt;: pybaseball (FanGraphs + Baseball Savant)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target&lt;/strong&gt;: MLB batters (PA≥100) / pitchers (IP≥30)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Period&lt;/strong&gt;: 2015-2024 (training), 2025 (evaluation)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Batter Features (38)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Statcast&lt;/td&gt;
&lt;td&gt;EV, Barrel%, xwOBA, Sprint Speed, Launch Angle, EV95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FanGraphs&lt;/td&gt;
&lt;td&gt;HardHit%, Contact%, O-Swing%, SwStr%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year lag delta&lt;/td&gt;
&lt;td&gt;wOBA change, xwOBA change, K% change, BB% change, Barrel% change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2-year trend (v7)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2-year wOBA direction (rising/falling)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineered (v7)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;age_from_peak&lt;/strong&gt; (distance from peak age 29), &lt;strong&gt;park_factor&lt;/strong&gt;, &lt;strong&gt;team_changed&lt;/strong&gt;, pa_rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interaction&lt;/td&gt;
&lt;td&gt;age × (xwOBA − wOBA) — luck sensitivity by age&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stacking&lt;/td&gt;
&lt;td&gt;lgb_delta (LightGBM OOF residual)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pitcher Features (35)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Statcast&lt;/td&gt;
&lt;td&gt;K%, BB%, Whiff%, CSW%, SwStr%, Barrel%, EV&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stuff&lt;/td&gt;
&lt;td&gt;Stuff+, Location+, Pitching+, Velo, Spin Rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year lag delta&lt;/td&gt;
&lt;td&gt;xFIP change, K% change, BB% change, K-BB% change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2-year trend (v7)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2-year xFIP direction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineered (v7)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;age_from_peak&lt;/strong&gt;, &lt;strong&gt;park_factor&lt;/strong&gt;, &lt;strong&gt;team_changed&lt;/strong&gt;, ip_rate, FIP-ERA gap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interaction&lt;/td&gt;
&lt;td&gt;age × K-BB%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stacking&lt;/td&gt;
&lt;td&gt;lgb_delta&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The park factor work from the NPB series was carried over into baseball-mlops as a &lt;code&gt;park_factor&lt;/code&gt; feature — the same methodology, now applied to MLB stadiums.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model
&lt;/h2&gt;

&lt;p&gt;Three models combined:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Marcel&lt;/strong&gt; (baseline): 3-year weighted avg + regression to mean + age adjustment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LightGBM&lt;/strong&gt;: Optuna 1000-trial hyperparameter optimization (time-series expanding-window CV)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bayes correction (ElasticNet)&lt;/strong&gt;: Predicts Marcel residuals using Statcast features, adds 80% CI

&lt;ul&gt;
&lt;li&gt;Recency Decay: samples weighted by 0.85/year (recent seasons count more)&lt;/li&gt;
&lt;li&gt;LightGBM OOF predictions used as stacking feature&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ensemble&lt;/strong&gt;: Marcel×31% + LightGBM×33% + Bayes×36% (auto-weighted by inverse MAE)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Backtest Design
&lt;/h3&gt;

&lt;p&gt;2025 is a &lt;strong&gt;strict holdout&lt;/strong&gt; — never seen by Optuna or CV:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2015-2019: Initial training
2020-2024: Time-series expanding-window CV (Optuna tuning)
2025:      Strict holdout (no leakage)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2025 Strict Holdout
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Marcel MAE&lt;/th&gt;
&lt;th&gt;ML MAE&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Batter wOBA&lt;/td&gt;
&lt;td&gt;0.0331&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.0291&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+12.1%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pitcher xFIP&lt;/td&gt;
&lt;td&gt;0.5038&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.4837&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+4.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CV results (batter 0.0281 / pitcher 0.521) are consistent with holdout — no overfitting detected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Year-by-Year Backtest
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Batter ML&lt;/th&gt;
&lt;th&gt;Marcel&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Pitcher ML&lt;/th&gt;
&lt;th&gt;Marcel&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;0.0359&lt;/td&gt;
&lt;td&gt;0.0371&lt;/td&gt;
&lt;td&gt;✓ +3.2%&lt;/td&gt;
&lt;td&gt;0.595&lt;/td&gt;
&lt;td&gt;0.618&lt;/td&gt;
&lt;td&gt;✓ +3.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021&lt;/td&gt;
&lt;td&gt;0.0293&lt;/td&gt;
&lt;td&gt;0.0317&lt;/td&gt;
&lt;td&gt;✓ +7.6%&lt;/td&gt;
&lt;td&gt;0.542&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;✓ +1.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;td&gt;0.0296&lt;/td&gt;
&lt;td&gt;0.0330&lt;/td&gt;
&lt;td&gt;✓ +10.3%&lt;/td&gt;
&lt;td&gt;0.578&lt;/td&gt;
&lt;td&gt;0.569&lt;/td&gt;
&lt;td&gt;✗ -1.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023&lt;/td&gt;
&lt;td&gt;0.0277&lt;/td&gt;
&lt;td&gt;0.0303&lt;/td&gt;
&lt;td&gt;✓ +8.7%&lt;/td&gt;
&lt;td&gt;0.535&lt;/td&gt;
&lt;td&gt;0.559&lt;/td&gt;
&lt;td&gt;✓ +4.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;0.0280&lt;/td&gt;
&lt;td&gt;0.0333&lt;/td&gt;
&lt;td&gt;✓ +16.0%&lt;/td&gt;
&lt;td&gt;0.509&lt;/td&gt;
&lt;td&gt;0.522&lt;/td&gt;
&lt;td&gt;✓ +2.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.0291&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.0331&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✓ +12.1%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.484&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.504&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✓ +4.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Batters: 6/6 wins. Pitchers: 5/6 wins (2022 loss likely due to limited training data — only COVID-shortened 2020-2021).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Does Statcast Help?
&lt;/h2&gt;

&lt;p&gt;The Bayes (ElasticNet) model predicts Marcel's residuals using Statcast features. Larger coefficients = more information Marcel is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batters
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Coef&lt;/th&gt;
&lt;th&gt;Interpretation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Max EV&lt;/td&gt;
&lt;td&gt;+0.0046&lt;/td&gt;
&lt;td&gt;Peak hitting power — Marcel can't see this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contact%&lt;/td&gt;
&lt;td&gt;+0.0040&lt;/td&gt;
&lt;td&gt;Finer skill signal than K% alone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BB%&lt;/td&gt;
&lt;td&gt;+0.0038&lt;/td&gt;
&lt;td&gt;Additional plate discipline information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;xwOBA&lt;/td&gt;
&lt;td&gt;+0.0037&lt;/td&gt;
&lt;td&gt;Luck-removed true hitting ability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pitchers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Coef&lt;/th&gt;
&lt;th&gt;Interpretation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pitching+&lt;/td&gt;
&lt;td&gt;-0.0892&lt;/td&gt;
&lt;td&gt;Overall stuff quality → lower future xFIP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K%&lt;/td&gt;
&lt;td&gt;-0.0631&lt;/td&gt;
&lt;td&gt;High strikeout rate outperforms Marcel forecast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SwStr%&lt;/td&gt;
&lt;td&gt;-0.0346&lt;/td&gt;
&lt;td&gt;Swing-and-miss ability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stuff+&lt;/td&gt;
&lt;td&gt;-0.0279&lt;/td&gt;
&lt;td&gt;Velocity + movement + spin combined&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Marcel's ERA/xFIP carries luck components. &lt;strong&gt;Statcast's stuff metrics (Stuff+/Pitching+) reflect skill stripped of luck, which is why they add predictive signal.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  MLOps Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Every Monday JST 11:00 (GitHub Actions cron)
  ↓
fetch_statcast.py (pybaseball → Statcast CSV)
  ↓
train.py (LightGBM + Optuna 1000 trials + Bayes correction)
  ↓
W&amp;amp;B Model Registry (MAE comparison → auto-promote "production" tag)
  ↓
FastAPI (polls W&amp;amp;B every 6h → auto-loads latest model)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The FastAPI server polls W&amp;amp;B every 6 hours and automatically loads the new model when the &lt;code&gt;production&lt;/code&gt; tag is updated — &lt;strong&gt;no container restart needed&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead: NPB Hawk-Eye
&lt;/h2&gt;

&lt;p&gt;NPB installed Hawk-Eye tracking in all 12 stadiums in 2024. Once data becomes publicly available (expected 2026+), this pipeline can be transplanted directly.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;baseball-mlops&lt;/th&gt;
&lt;th&gt;NPB Hawk-Eye version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;pybaseball&lt;/td&gt;
&lt;td&gt;NPB Hawk-Eye API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EV / Barrel% / xwOBA&lt;/td&gt;
&lt;td&gt;Equivalent metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MLB Marcel&lt;/td&gt;
&lt;td&gt;NPB Marcel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LightGBM + Bayes&lt;/td&gt;
&lt;td&gt;Same architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;NPB Bayesian project&lt;/th&gt;
&lt;th&gt;baseball-mlops (MLB)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data&lt;/td&gt;
&lt;td&gt;K%/BB%/BABIP (aggregate stats)&lt;/td&gt;
&lt;td&gt;Statcast (tracking)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marcel improvement&lt;/td&gt;
&lt;td&gt;Marginal (p=0.06)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+12.1% (batters) / +4.0% (pitchers)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Year-by-year wins&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Batters 6/6, Pitchers 5/6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reason Statcast works: Marcel's 3-year weighted average can't see &lt;strong&gt;contact quality or pitch stuff&lt;/strong&gt;. Exit velocity, barrel rate, and Stuff+ directly measure those dimensions that aggregate stats miss.&lt;/p&gt;

&lt;p&gt;Data: &lt;a href="https://baseballsavant.mlb.com/" rel="noopener noreferrer"&gt;Baseball Savant&lt;/a&gt; / &lt;a href="https://www.fangraphs.com/" rel="noopener noreferrer"&gt;FanGraphs&lt;/a&gt; via pybaseball&lt;/p&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>mlops</category>
    </item>
    <item>
      <title>Did Adding Stadium Correction Improve My NPB Baseball Predictions? — A Full Backtest Comparison</title>
      <dc:creator>YMori</dc:creator>
      <pubDate>Thu, 05 Mar 2026 08:34:08 +0000</pubDate>
      <link>https://forem.com/yasumorishima/did-adding-stadium-correction-improve-my-npb-baseball-predictions-a-full-backtest-comparison-3672</link>
      <guid>https://forem.com/yasumorishima/did-adding-stadium-correction-improve-my-npb-baseball-predictions-a-full-backtest-comparison-3672</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This is a follow-up to my NPB (Nippon Professional Baseball) standings prediction series. I added &lt;strong&gt;park factor correction&lt;/strong&gt; to the existing Marcel+Stan Bayesian system and ran a full backtest (2018–2025, 96 team-seasons) to measure the impact.&lt;/p&gt;

&lt;p&gt;Previous articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasumorishima/beyond-marcel-adding-bayesian-regression-to-npb-baseball-predictions-a-15-step-journey-1b4f"&gt;Beyond Marcel: Adding Bayesian Regression to NPB Baseball Predictions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasumorishima/npb-park-factors"&gt;I Calculated NPB Park Factors for 10 Years — Stadium Renovations Revealed&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yasumorishima/npb-bayes-projection" rel="noopener noreferrer"&gt;npb-bayes-projection&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Terms (for first-time readers)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Marcel method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Predicts next year's stats using a weighted 3-year average (weights: 5:4:3, recent years weighted higher)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bayesian prediction (Stan)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Estimates probability distributions from data, capturing uncertainty in predictions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Park factor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Measures how much a stadium inflates or suppresses scoring. 1.0 = neutral; &amp;gt;1.0 = hitter-friendly; &amp;lt;1.0 = pitcher-friendly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pythagorean win%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Estimates win% from runs scored (RS) and allowed (RA): &lt;code&gt;RS^1.83 / (RS^1.83 + RA^1.83)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MAE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mean Absolute Error — average prediction miss. Lower is better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;80% CI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;80% confidence interval — the range where actual values fall 80% of the time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why Park Factor Correction?
&lt;/h2&gt;

&lt;p&gt;Marcel predicts player stats from their past 3 years. The problem: &lt;strong&gt;those stats embed the home stadium's environment&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vantelin Dome (Chunichi Dragons)&lt;/strong&gt;: PF_5yr = 0.844 → heavily pitcher-friendly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ES CON Field (Nippon Ham)&lt;/strong&gt;: PF_5yr = 1.147 → hitter-friendly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Vantelin pitcher's ERA looks better partly because of the park. Using those raw stats to project team runs allowed (RA) will underestimate RA compared to a neutral stadium.&lt;/p&gt;

&lt;h3&gt;
  
  
  The correction formula
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# (PF + 1.0) / 2.0 = average of home and away
# Players play half games at home, half away
&lt;/span&gt;&lt;span class="n"&gt;pf_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PF_5yr&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;
&lt;span class="n"&gt;rs_adjusted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs_raw&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;pf_factor&lt;/span&gt;   &lt;span class="c1"&gt;# normalize runs scored to neutral park
&lt;/span&gt;&lt;span class="n"&gt;ra_adjusted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ra_raw&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;pf_factor&lt;/span&gt;   &lt;span class="c1"&gt;# normalize runs allowed to neutral park
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results (2018–2025, 96 team-seasons)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Win MAE didn't change — here's why
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No correction&lt;/th&gt;
&lt;th&gt;5yr avg PF&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Win MAE&lt;/td&gt;
&lt;td&gt;6.41&lt;/td&gt;
&lt;td&gt;6.41&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;±0.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win Bias&lt;/td&gt;
&lt;td&gt;+2.69&lt;/td&gt;
&lt;td&gt;+2.70&lt;/td&gt;
&lt;td&gt;+0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;80% CI coverage&lt;/td&gt;
&lt;td&gt;86.5%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87.5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+1.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;MAE didn't move at all. The reason is &lt;strong&gt;structural&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After correction: RS_adj = RS / factor,  RA_adj = RA / factor
Pythagorean: RS^exp / (RS^exp + RA^exp)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you divide both RS and RA by the same factor, the &lt;strong&gt;ratio is preserved&lt;/strong&gt;. Pythagorean win% depends on the ratio — so win predictions barely change.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;80% CI coverage improved from 86.5% to 87.5%&lt;/strong&gt;. Removing the park bias makes the prediction distribution slightly more reliable, even when the point estimate stays the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  RS and RA accuracy improved significantly
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No correction&lt;/th&gt;
&lt;th&gt;5yr avg PF&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RS MAE (runs scored)&lt;/td&gt;
&lt;td&gt;101.1&lt;/td&gt;
&lt;td&gt;74.8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-26.3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RA MAE (runs allowed)&lt;/td&gt;
&lt;td&gt;97.5&lt;/td&gt;
&lt;td&gt;73.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-24.5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The absolute-value accuracy of run predictions improved substantially. This doesn't directly affect win predictions, but it matters for &lt;strong&gt;player valuation and roster construction analysis&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Year-by-year breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Win MAE (no PF)&lt;/th&gt;
&lt;th&gt;Win MAE (5yr PF)&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2018&lt;/td&gt;
&lt;td&gt;6.18&lt;/td&gt;
&lt;td&gt;6.18&lt;/td&gt;
&lt;td&gt;±0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2019&lt;/td&gt;
&lt;td&gt;3.90&lt;/td&gt;
&lt;td&gt;3.90&lt;/td&gt;
&lt;td&gt;±0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;6.27&lt;/td&gt;
&lt;td&gt;6.28&lt;/td&gt;
&lt;td&gt;+0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021&lt;/td&gt;
&lt;td&gt;10.33&lt;/td&gt;
&lt;td&gt;10.33&lt;/td&gt;
&lt;td&gt;±0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;td&gt;5.13&lt;/td&gt;
&lt;td&gt;5.12&lt;/td&gt;
&lt;td&gt;-0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023&lt;/td&gt;
&lt;td&gt;6.88&lt;/td&gt;
&lt;td&gt;6.89&lt;/td&gt;
&lt;td&gt;+0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;6.69&lt;/td&gt;
&lt;td&gt;6.71&lt;/td&gt;
&lt;td&gt;+0.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;5.90&lt;/td&gt;
&lt;td&gt;5.90&lt;/td&gt;
&lt;td&gt;±0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 2021 spike (MAE = 10.33) reflects Yakult and Orix going from last place to champions — an exceptional event unrelated to park factors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Single-Year PF vs. 5-Year Average: Which Is Better?
&lt;/h2&gt;

&lt;p&gt;I tested two variants of park factor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single-year PF&lt;/strong&gt;: calculated from one season only — higher noise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PF_5yr&lt;/strong&gt;: 5-year rolling average with renovation breakpoints — smoother&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;No PF&lt;/th&gt;
&lt;th&gt;Single-year PF&lt;/th&gt;
&lt;th&gt;5-year avg PF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Win MAE&lt;/td&gt;
&lt;td&gt;6.41&lt;/td&gt;
&lt;td&gt;6.41&lt;/td&gt;
&lt;td&gt;6.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RS MAE&lt;/td&gt;
&lt;td&gt;101.1&lt;/td&gt;
&lt;td&gt;74.8&lt;/td&gt;
&lt;td&gt;74.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RA MAE&lt;/td&gt;
&lt;td&gt;97.5&lt;/td&gt;
&lt;td&gt;73.0&lt;/td&gt;
&lt;td&gt;73.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;80% CI coverage&lt;/td&gt;
&lt;td&gt;86.5%&lt;/td&gt;
&lt;td&gt;86.5%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87.5%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;RS/RA accuracy improved equally with both. The &lt;strong&gt;only difference is CI coverage&lt;/strong&gt; — single-year PF is too noisy to improve the prediction interval. The 5-year average's smoothing is what improves reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Focus: Nippon Ham and ES CON Field Opening (2023)
&lt;/h2&gt;

&lt;p&gt;In 2023, Nippon Ham moved from Sapporo Dome to ES CON Field — a brand-new ballpark.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Single-year PF&lt;/th&gt;
&lt;th&gt;5yr avg PF&lt;/th&gt;
&lt;th&gt;Predicted W&lt;/th&gt;
&lt;th&gt;Actual W&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2022 (last at Sapporo)&lt;/td&gt;
&lt;td&gt;0.949&lt;/td&gt;
&lt;td&gt;0.967&lt;/td&gt;
&lt;td&gt;68.2&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;+9.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2023 (ES CON opens)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.969&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.969&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;65.4&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;+5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.212&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.089&lt;/td&gt;
&lt;td&gt;69.0&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;-6.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.271&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.147&lt;/td&gt;
&lt;td&gt;73.4&lt;/td&gt;
&lt;td&gt;83&lt;/td&gt;
&lt;td&gt;-9.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Opening year (2023)&lt;/strong&gt;: single-year and 5-year PF happen to match (0.969). The 5-year average was still dominated by Sapporo Dome data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2024–2025&lt;/strong&gt;: the gap widens. ES CON is clearly hitter-friendly (PF &amp;gt; 1.2), but the 5-year average is still held down by Sapporo Dome history. Win predictions don't change between methods — confirming the structural argument above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Finding&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Win MAE&lt;/td&gt;
&lt;td&gt;No change (structurally cannot change)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RS/RA MAE&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;-26 / -25 runs&lt;/strong&gt; improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;80% CI coverage&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;+1.0%&lt;/strong&gt; (5-year average only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-year vs. 5-year PF&lt;/td&gt;
&lt;td&gt;Same accuracy; 5-year wins on CI reliability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The unchanging win MAE isn't a failure — it's by design. The Pythagorean formula preserves the RS/RA ratio when both are scaled by the same factor.&lt;/p&gt;

&lt;p&gt;Park factor correction improves &lt;strong&gt;prediction interval reliability&lt;/strong&gt; and &lt;strong&gt;absolute run accuracy&lt;/strong&gt;, which matters for player analysis even when the win total doesn't shift.&lt;/p&gt;

&lt;p&gt;As ES CON and renovated stadiums like Vantelin Dome (2026: HR wing) and Rakuten Mobile Park (2026: fence moved in) accumulate data, the gap between single-year and 5-year PF will grow. That's when the choice of smoothing method will matter more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Articles
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasumorishima/beyond-marcel-adding-bayesian-regression-to-npb-baseball-predictions-a-15-step-journey-1b4f"&gt;Beyond Marcel: Adding Bayesian Regression to NPB Baseball Predictions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasumorishima/npb-park-factors"&gt;I Calculated NPB Park Factors for 10 Years — Stadium Renovations Revealed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/yasumorishima/npb-bayes-park-factors"&gt;I Added Park Factor Correction to My NPB Bayesian Prediction Model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>baseball</category>
      <category>python</category>
      <category>bayesian</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
