<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Cara Jung</title>
    <description>The latest articles on Forem by Cara Jung (@carasjung).</description>
    <link>https://forem.com/carasjung</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3448967%2F4a49be6a-32ba-4490-aa57-3617e659db09.png</url>
      <title>Forem: Cara Jung</title>
      <link>https://forem.com/carasjung</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/carasjung"/>
    <language>en</language>
    <item>
      <title>Building a Unified Korean Entertainment Database from 10 APIs and Web Scrapers</title>
      <dc:creator>Cara Jung</dc:creator>
      <pubDate>Sat, 09 May 2026 20:56:22 +0000</pubDate>
      <link>https://forem.com/carasjung/building-a-unified-korean-entertainment-database-from-10-apis-and-web-scrapers-3n91</link>
      <guid>https://forem.com/carasjung/building-a-unified-korean-entertainment-database-from-10-apis-and-web-scrapers-3n91</guid>
      <description>&lt;p&gt;Korean entertainment has become a global phenomenon with shows such as Squid Game breaking records and K-dramas topping global charts. And yet, the data infrastructure behind it is fragmented.&lt;/p&gt;

&lt;p&gt;Getting complete data on a single Korean show or film — cast, ratings (Korean and international), episode viewership numbers, where to stream it, what awards it won, its OST albums — requires hopscotching different websites. &lt;/p&gt;

&lt;p&gt;The issue is that dominant platforms like NAVER and Melon lack English-first APIs. As Session Zero points out in this &lt;a href="https://dev.to/sessionzero_ai/i-built-an-mcp-server-for-korean-data-heres-why-e92"&gt;article&lt;/a&gt;, Korean data is heavily underserved in MCP ecosystems because when Western developer tools and AI systems are built, Korean platforms are often invisible by default.&lt;/p&gt;

&lt;p&gt;The data exists. But it’s trapped behind language barriers, undocumented endpoints, JavaScript-rendered pages, and closed ecosystems. So while AI agents can easily retrieve structured information about Hollywood movies, Spotify charts, or IMDb ratings, asking the same systems about Korean dramas, OSTs, or Korean audience sentiment often returns incomplete results or nothing at all. &lt;/p&gt;

&lt;p&gt;So I decided to build a unified database to fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Data Landscape
&lt;/h2&gt;

&lt;p&gt;Korean entertainment data splits along two axes: &lt;strong&gt;language&lt;/strong&gt; (Korean vs. English sources) and &lt;strong&gt;type&lt;/strong&gt; (official vs. community vs. streaming).&lt;/p&gt;

&lt;h3&gt;
  
  
  English-language sources
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;TMDB&lt;/strong&gt; is the closest thing to a comprehensive English-language database for Korean content. It has structured data on tens of thousands of Korean films and shows, a stable API, and community ratings. But it lacks Korea-specific data: no verified Korean audience scores, no Nielsen viewership ratings, no Korean box office data, no OST information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MyDramaList&lt;/strong&gt; fills a critical gap that TMDB misses entirely: community tags. MDL users have tagged Korean dramas with labels such as "Bromance", "Time Travel", "CEO Male Lead", and "Found Family." No official database captures that taxonomy. MDL also tracks airing status more accurately than TMDB for Korean dramas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HanCinema&lt;/strong&gt; has the deepest historical coverage of Korean content in English, including films from the 1950s through 1990s that TMDB barely covers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JustWatch&lt;/strong&gt; is the most reliable real-time source for streaming availability. TMDB's streaming data lags reality by weeks. JustWatch checks 364 services daily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wikipedia&lt;/strong&gt; has rich content for major Korean films and shows including detailed plot summaries, production history, cultural reception sections that no English entertainment database captures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Korean-language sources
&lt;/h3&gt;

&lt;p&gt;Here's where things get interesting and painful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NAVER&lt;/strong&gt; is Korea's dominant search engine and entertainment portal. Search for any Korean film on NAVER and you'll get a rich information card with two ratings that don't exist anywhere else:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;실관람객 평점&lt;/strong&gt; (Verified ticket buyer rating): Only people who purchased cinema tickets through affiliated platforms can rate. This is Korea's equivalent of a verified purchase review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;네티즌 평점&lt;/strong&gt; (Netizen rating): Korean general public rating.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These ratings often diverge significantly from international scores. Parasite has a 9.08 verified buyer rating on NAVER versus 8.5 on TMDB. The Korean audience who saw it in theaters rated it exceptionally highly.&lt;/p&gt;

&lt;p&gt;NAVER also has per-episode Nielsen Korea viewership ratings for TV dramas, which is the official broadcast ratings that Korean media reports on weekly. No other English-language source has this data structured and queryable.&lt;/p&gt;

&lt;p&gt;The critical catch: &lt;strong&gt;NAVER has no public API for any of this&lt;/strong&gt;. Their entertainment data is rendered dynamically in JavaScript, served through their search interface, and entirely undocumented. Every data point requires a browser.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KOBIS&lt;/strong&gt; (Korean Film Council) is the exception. It has an official government API that provides authoritative weekly and daily box office rankings. It's the only Korean government data source with a proper REST API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Scrapers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Playwright Problem
&lt;/h3&gt;

&lt;p&gt;Most of the Korean data sources render content through JavaScript. Static HTML requests return empty shells. This meant nearly every scraper needed a real browser.&lt;/p&gt;

&lt;p&gt;To address this, I used &lt;strong&gt;Playwright&lt;/strong&gt; with Chromium headless across all JS-rendered sources. The setup is consistent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.sync_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sync_playwright&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_page_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_selector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;locale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ko-KR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domcontentloaded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# let lazy content settle
&lt;/span&gt;        &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;locale="ko-KR"&lt;/code&gt; matters for NAVER since it ensures Korean content is served rather than any region-specific variant.&lt;/p&gt;

&lt;h3&gt;
  
  
  The NAVER Genre Problem
&lt;/h3&gt;

&lt;p&gt;One of the more unexpected parsing challenges came from NAVER's movie information card. Genre, country, and runtime appeared concatenated: &lt;code&gt;공포대한민국95분&lt;/code&gt; (Horror South Korea 95min). They were in a single &lt;code&gt;dd&lt;/code&gt; tag separated by invisible &lt;code&gt;span.cm_bar_info&lt;/code&gt; elements.&lt;/p&gt;

&lt;p&gt;The fix was to replace the separator spans with pipe characters before splitting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;first_dd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;info_groups&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;first_dd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;first_dd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;segments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;first_dd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;segments&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extracting Nielsen Ratings from SVG
&lt;/h3&gt;

&lt;p&gt;The trickiest scraping problem was NAVER's episode viewership chart. The data is rendered as an interactive SVG chart where viewership percentages, episode numbers, and air dates are all inside SVG text elements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_parse_episode_chart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# Rating values from bb-text elements inside the SVG
&lt;/span&gt;    &lt;span class="n"&gt;rating_texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;g.bb-texts-rank text.bb-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rating_texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="c1"&gt;# Episode numbers and dates from x-axis ticks
&lt;/span&gt;    &lt;span class="n"&gt;x_ticks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;g.bb-axis-x g.tick&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ep_labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tick&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x_ticks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tspans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tick&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tspan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tspans&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;ep_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_parse_episode_num&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tspans&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;date_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tspans&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ep_num&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;date_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;ep_labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ep_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;date_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;air_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ep_labels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us per-episode Nielsen ratings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ep 1 (12.02.): 6.3%
Ep 8 (12.24.): 12.3%
Ep 16 (01.21.): 20.5%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Goblin. No English-language API has this data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The JustWatch Shadow DOM Problem
&lt;/h3&gt;

&lt;p&gt;JustWatch uses Web Components with Shadow DOM for their streaming offer cards. The score and provider data that appears in the browser is inside &lt;code&gt;&amp;lt;slot&amp;gt;&lt;/code&gt; elements that don't render in server-side HTML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"score-wrap"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"critics-score-wrap"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;slot&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"critics-score"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/slot&amp;gt;&lt;/span&gt;  &lt;span class="c"&gt;&amp;lt;!-- empty in scraped HTML --&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, the streaming offers themselves (provider names, prices, monetization types) render in the regular DOM inside &lt;code&gt;div.buybox-selector a.offer&lt;/code&gt; elements. The key insight was that the offers were accessible even though the score slots weren't.&lt;/p&gt;

&lt;p&gt;Extracting the actual streaming URLs from JustWatch's redirect links required parsing the &lt;code&gt;r=&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_redirect_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;href&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;href&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_qs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;unquote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;href&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Awards Parsing: Five Ceremonies, Three Formats
&lt;/h3&gt;

&lt;p&gt;Korean drama and film awards span five major ceremonies, each with slightly different HTML structure. I scraped all of them from AsianWiki plus the official Baeksang Awards site.&lt;/p&gt;

&lt;p&gt;The unexpected challenge was that award categories use different winner formats depending on ceremony type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Drama awards:&lt;/strong&gt; &lt;code&gt;Person ("Show Title")&lt;/code&gt; → &lt;code&gt;links[0]&lt;/code&gt; is person, &lt;code&gt;links[1]&lt;/code&gt; is show&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Film awards:&lt;/strong&gt; &lt;code&gt;"Film Title"&lt;/code&gt; → title-only, no person/show split&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blue Dragon Series:&lt;/strong&gt; &lt;code&gt;"Title" (Platform)&lt;/code&gt; → title plus streaming platform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The format detection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;value_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;item_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bold_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lstrip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'"'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Film/series format: title only
&lt;/span&gt;    &lt;span class="n"&gt;current_category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;current_category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winner_show&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Drama format: person + show
&lt;/span&gt;    &lt;span class="n"&gt;current_category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;current_category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winner_show&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the search function deduplicates winners who also appear in the nominees list. AsianWiki includes the winner in the nominees list, so a naive search returns double entries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Skip if this is the same person/title as winner (dedup)
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;nom_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;winner_name&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;winner_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Wikipedia Section Problem
&lt;/h2&gt;

&lt;p&gt;Wikipedia articles don't have standardized section names. "Plot" might be called "Synopsis", "Series overview", "Story", or "Episodes" depending on who wrote the article and when.&lt;/p&gt;

&lt;p&gt;We built a section alias system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SECTION_ALIASES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Plot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Plot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Synopsis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Story&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Series overview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Premise&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Episodes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cast and characters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Characters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Main cast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ratings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ratings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Viewership ratings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Television ratings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Viewership&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reception&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reception&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Critical response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Critical reception&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The smart lookup tries each alias in order until it finds content. Crash Landing on You uses "Episodes" for its episode list and "Viewership" for its ratings section — both non-standard names that the alias system handles automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cross-Source ID Management
&lt;/h2&gt;

&lt;p&gt;One of the harder database design decisions was how to link data across sources. A single show like Crash Landing on You has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TMDB ID: &lt;code&gt;94796&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;MDL ID: &lt;code&gt;70&lt;/code&gt; (from slug &lt;code&gt;70-crash-landing-on-you&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Naver show OS ID: &lt;code&gt;3522952&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;JustWatch slug: &lt;code&gt;crash-landing-on-you&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Wikipedia title: &lt;code&gt;Crash Landing on You&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The database stores all of these as columns on the &lt;code&gt;tv_shows&lt;/code&gt; table. No single ID links all sources — the TMDB ID is the primary key because TMDB has the broadest coverage and most stable IDs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;tv_shows&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;           &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;uuid_generate_v4&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;tmdb_id&lt;/span&gt;      &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;mdl_id&lt;/span&gt;       &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;mdl_slug&lt;/span&gt;     &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;naver_show_id&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;justwatch_slug&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;wikipedia_title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- ... ratings, metadata, etc.&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipeline links sources progressively: TMDB runs first to seed core titles, then MDL enriches with ratings and tags, then NAVER TV adds episode ratings using the Korean title stored by TMDB.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rating Field Naming Convention
&lt;/h2&gt;

&lt;p&gt;With eight different rating sources covering different audiences and methodologies, naming discipline was essential:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;tmdb_rating&lt;/span&gt;              &lt;span class="c"&gt;# Global community (0-10)
&lt;/span&gt;&lt;span class="err"&gt;mdl_rating&lt;/span&gt;               &lt;span class="c"&gt;# International K-drama fans (0-10)
&lt;/span&gt;&lt;span class="err"&gt;naver_audience_rating&lt;/span&gt;    &lt;span class="c"&gt;# Korean verified ticket buyers (0-10)
&lt;/span&gt;&lt;span class="err"&gt;naver_netizen_rating&lt;/span&gt;     &lt;span class="c"&gt;# Korean general public (0-10)
&lt;/span&gt;&lt;span class="err"&gt;naver_latest_rating&lt;/span&gt;      &lt;span class="c"&gt;# Nielsen Korea latest episode (%)
&lt;/span&gt;&lt;span class="err"&gt;naver_highest_rating&lt;/span&gt;     &lt;span class="c"&gt;# Nielsen Korea peak episode (%)
&lt;/span&gt;&lt;span class="err"&gt;rt_tomatometer&lt;/span&gt;           &lt;span class="c"&gt;# Western professional critics (0-100)
&lt;/span&gt;&lt;span class="err"&gt;rt_audience_score&lt;/span&gt;        &lt;span class="c"&gt;# Western RT users (0-100)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are never stored as a generic "rating" field. The naming makes the source and audience type explicit at the schema level, preventing any ambiguity in downstream queries or API responses.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Unlocks
&lt;/h2&gt;

&lt;p&gt;The unified database enables queries that weren't previously possible:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find dramas where Korean audiences loved it but Western critics were lukewarm:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title_english&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;naver_audience_rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rt_tomatometer&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;tv_shows&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;naver_audience_rating&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rt_tomatometer&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Find all content with OST albums by IU:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title_english&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;album_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vibe_url&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ost_albums&lt;/span&gt; &lt;span class="n"&gt;oa&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;tv_shows&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;oa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;oa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artist&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%IU%'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;oa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artist&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%아이유%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Find award winners available on Netflix US:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title_english&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;awards&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;tv_shows&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;streaming_availability&lt;/span&gt; &lt;span class="n"&gt;sa&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Netflix'&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'us'&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;monetization_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Subscription'&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;won&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Find the drama with the biggest episode-to-episode rating jump:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;episode_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nielsen_rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;nielsen_rating&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nielsen_rating&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;episode_number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;jump&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;jump&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="n"&gt;NULLS&lt;/span&gt; &lt;span class="k"&gt;LAST&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;None of these queries are possible against any single existing source.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The goal of this project is to build an MCP server for Korean entertainment data so that makes Korean movies and TV shows are accessible to AI agents and developer tooling in a structured, searchable way.&lt;/p&gt;

&lt;p&gt;Instead of forcing developers to manually scrape different sites just to answer a basic query, the MCP server will expose unified tools that support natural language requests like “Find me a political thriller Korean audiences loved that’s available on Netflix and maintained strong episode ratings throughout its run.”&lt;/p&gt;

&lt;p&gt;Under the hood, that means reconciling fragmented metadata across 10,000+ titles from APIs, scrapers, streaming providers, audience ratings, box office systems, and community-driven sources.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 2 will dive into the pipeline architecture itself: the automated sync system, GitHub Actions orchestration, incremental updates, scraper failures, rate limits, deduplication headaches, and the very questionable debugging decisions made at 2AM.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>webscraping</category>
      <category>korea</category>
      <category>ai</category>
    </item>
    <item>
      <title>Pick Your Auth: An Interactive Guide</title>
      <dc:creator>Cara Jung</dc:creator>
      <pubDate>Mon, 13 Apr 2026 16:00:00 +0000</pubDate>
      <link>https://forem.com/carasjung/pick-your-auth-an-interactive-guide-44</link>
      <guid>https://forem.com/carasjung/pick-your-auth-an-interactive-guide-44</guid>
      <description>&lt;p&gt;Most auth tutorials focus on how authentication works such as how to drop in a component, spin up a dev server, and get a login screen running. There's no shortage of guides that tell you which method to use for your use case. What's missing is the hands-on part: actually experiencing each flow the way your users do, so you can feel the friction, see the session it produces, and make an informed decision from the ground up.&lt;/p&gt;

&lt;p&gt;Magic link or passkey? Social login or OTP? The answer changes depending on whether you're building a consumer app, a fintech product, a B2B SaaS, or an internal tool. The choice is a product decision that affects activation, security posture, compliance, and long-term maintainability.&lt;/p&gt;

&lt;p&gt;To tackle this dilemma, I built an interactive demo called &lt;strong&gt;Auth Decision Kit&lt;/strong&gt; that lets you try three Descope auth flows live: magic link, social login, and passkey. This demo focuses on how each approach fits different product contexts and the tradeoffs you need to consider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo: &lt;a href="https://auth-decision-kit.vercel.app/" rel="noopener noreferrer"&gt;auth-decision-kit.vercel.app&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;GitHub: &lt;a href="https://github.com/carasjung/auth-decision-kit" rel="noopener noreferrer"&gt;github.com/carasjung/auth-decision-kit&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each method has five tabs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;01 Auth Flow&lt;/strong&gt;&lt;br&gt;
Authenticate for real using a live Descope integration. See the actual UX users experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;02 Session Inspector&lt;/strong&gt;&lt;br&gt;
After authenticating, inspect every claim in your JWT payload. Each field is annotated with what it means, why it matters, and when you'd use it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;03 Decision Matrix&lt;/strong&gt;&lt;br&gt;
Green / yellow / red ratings across six product contexts: B2B, consumer app, developer tool, internal tool, fintech, SaaS, and mobile-first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;04 Failure Simulator&lt;/strong&gt; &lt;br&gt;
Trigger each failure mode and see the Descope error code and the correct handling code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;05 Code&lt;/strong&gt; &lt;br&gt;
Copy-ready implementation snippets for Next.js&lt;/p&gt;


&lt;h2&gt;
  
  
  The Session Inspector: JWT Breakdown
&lt;/h2&gt;

&lt;p&gt;One of the most useful things I learned building this is how different the JWT payload looks depending on which auth method you used and why they matter for your backend logic.&lt;/p&gt;

&lt;p&gt;After a &lt;strong&gt;magic link&lt;/strong&gt; auth, your session contains &lt;code&gt;authenticationMethod: "magiclink"&lt;/code&gt; and &lt;code&gt;verifiedEmail: true&lt;/code&gt;. The email verification is implicit, clicking the link is proof of inbox access. This is a meaningful signal for risk scoring and it also shows that magic link is a single factor (access to your inbox). For products that require two factors like healthcare and fintech, magic link on its own won’t satisfy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdgj5efcxw28gvywwfou.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdgj5efcxw28gvywwfou.png" alt="Magic link session" width="800" height="814"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After &lt;strong&gt;social login&lt;/strong&gt;, you get the provider's access token nested under &lt;code&gt;oauth.google.accessToken&lt;/code&gt; (or whichever provider). You also get &lt;code&gt;externalIds.google&lt;/code&gt;, a stable provider-specific user ID that won't change even if the user changes their email address on Google's side. That's the field you want for account linking. Since you get free profile data, this is effective for consumer and developer tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyaisvibzux3vf9xzg3lz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyaisvibzux3vf9xzg3lz.png" alt="Social login session" width="800" height="1211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a &lt;strong&gt;passkey&lt;/strong&gt; auth, the &lt;code&gt;amr&lt;/code&gt; (Authentication Methods References) claim contains &lt;code&gt;"hwk"&lt;/code&gt; (hardware key) and &lt;code&gt;"user"&lt;/code&gt;. This is the claim compliance teams care about. It's proof that a hardware-bound credential was used, not just a password or a link. Passkey is also the only method here where the private key never leaves the user’s device. Even a full Descope breach couldn’t expose user credentials. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hsm5v3asdr9722ij2gy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hsm5v3asdr9722ij2gy.png" alt="Passkey session" width="800" height="892"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Decision Matrix
&lt;/h2&gt;

&lt;p&gt;Here's a condensed version of what I found after thinking through six product contexts for each method:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s1by4v9a4f8ms740j2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s1by4v9a4f8ms740j2f.png" alt="Decision matrix" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Magic link&lt;/strong&gt; is the sweet spot for B2B SaaS and early-stage products. Zero password management, implicit email verification, and simple implementation. However, it falls apart on mobile (context switch to email app kills conversion) and in high-security contexts where email as a sole factor isn't enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Social login&lt;/strong&gt; is the fastest path to activation for consumer and developer tools. GitHub login in particular gives you free org and repo data via the OAuth token, which is useful for developer-focused products. Avoid it for fintech and banking where regulations often require you to own the identity directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Passkey&lt;/strong&gt; is genuinely the best option for mobile-first and high-security context. Phishing-proof by design, the private key never leaves the device. The catch: users still need education on what a passkey is and you need a fallback for older browsers and lost devices.&lt;/p&gt;

&lt;p&gt;Most products should offer at least two methods where one can be the default while the other an alternative. For instance, using magic link as the default and passkey as the upgrade path once users are comfortable.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Failure Simulator
&lt;/h2&gt;

&lt;p&gt;Auth flows break in predictable ways. Understanding those failure points from day one lets you design a seamless recovery experience so users can continue without friction and avoid escalating to support.&lt;/p&gt;

&lt;p&gt;The failure simulator surfaces these scenarios using real Descope error codes and responses. While it doesn’t make live network calls, it replays actual API error outputs so you can explore failure cases without having to intentionally break a real session.&lt;/p&gt;

&lt;p&gt;Magic links expire (Descope's default is 2 minutes). When they do, the &lt;code&gt;onError&lt;/code&gt; callback fires with error code &lt;code&gt;E011303&lt;/code&gt;. Your UI should catch this and offer to send a new link, not show a generic error message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu90m6m8cyp4cr8woo4ap.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu90m6m8cyp4cr8woo4ap.png" alt="Error code for expired magic link" width="800" height="710"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Social login gets cancelled. Users click "Continue with Google," see the permissions screen, and hit Cancel. That fires &lt;code&gt;E062503&lt;/code&gt;. The right response is to return the user silently to the login screen and treat a cancellation as a choice, not an error.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobkmy2bwzlh81m8690e4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobkmy2bwzlh81m8690e4.png" alt="User denied session error for social login" width="800" height="779"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Passkeys on new devices fire &lt;code&gt;E083002&lt;/code&gt; (WebAuthn NotAllowedError). The recovery flow is: fall back to magic link or OTP to verify identity, then offer to enroll a passkey on the new device. This is also why you should never make passkey the only auth method since you always need a fallback for device loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fersspagsktjwj64cobgb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fersspagsktjwj64cobgb.png" alt="Passkey failure error" width="800" height="873"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Stack and Setup
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js 15&lt;/strong&gt; with App Router&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Descope Next.js SDK&lt;/strong&gt; (&lt;code&gt;@descope/nextjs-sdk&lt;/code&gt;) for auth flows and session management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framer Motion&lt;/strong&gt; for tab transitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS&lt;/strong&gt; for layout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire setup is about 800 lines of TypeScript across nine files. All core data (steps, session highlights, decision matrix scores, failure scenarios, and code snippets) lives in a single &lt;code&gt;lib/auth.ts&lt;/code&gt; file. Adding a new auth method requires only a single entry point, keeping the system easy to extend.&lt;/p&gt;

&lt;p&gt;To run it yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/carasjung/auth-decision-kit
&lt;span class="nb"&gt;cd &lt;/span&gt;auth-decision-kit
npm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.local.example .env.local
&lt;span class="c"&gt;# add your NEXT_PUBLIC_DESCOPE_PROJECT_ID&lt;/span&gt;
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll need a free Descope account. Once you’ve created your account, grab your Project ID from &lt;a href="https://app.descope.com/settings/project" rel="noopener noreferrer"&gt;app.descope.com/settings/project&lt;/a&gt; and configure a &lt;code&gt;sign-up-or-in&lt;/code&gt; flow with whichever methods you want to test.&lt;/p&gt;




&lt;h2&gt;
  
  
  From Demo to Decision
&lt;/h2&gt;

&lt;p&gt;There are plenty of great auth demos that show how things work. This one focuses on how to choose between them.&lt;/p&gt;

&lt;p&gt;Auth is infrastructure and like many infrastructure decisions, the cost of getting it wrong rarely shows up immediately. It appears later through conversion drop-offs, security tradeoffs, compliance constraints, and migrations.&lt;/p&gt;

&lt;p&gt;While modern tools make it easier to support multiple methods and evolve your approach over time, the decision of what to use and when still requires good judgement upfront. This project is designed to help make that choice more intentional.&lt;/p&gt;

&lt;p&gt;The live demo is at &lt;strong&gt;&lt;a href="https://auth-decision-kit.vercel.app/" rel="noopener noreferrer"&gt;auth-decision-kit.vercel.app&lt;/a&gt;&lt;/strong&gt; and the full source is on &lt;strong&gt;&lt;a href="https://github.com/carasjung/auth-decision-kit" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;. &lt;/p&gt;

</description>
      <category>authentication</category>
      <category>nextjs</category>
      <category>webdev</category>
      <category>security</category>
    </item>
    <item>
      <title>What Predicts a Hit? I Trained 3 ML Models to Find Out</title>
      <dc:creator>Cara Jung</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/carasjung/what-predicts-a-hit-i-trained-3-ml-models-to-find-out-31mj</link>
      <guid>https://forem.com/carasjung/what-predicts-a-hit-i-trained-3-ml-models-to-find-out-31mj</guid>
      <description>&lt;p&gt;In many entertainment adaptation decisions, content selections are still instinct-driven. Maybe a producer was vibing with a story or overheard their Gen Alpha nephew mentioning a GOAT title. This subjective approach has often led to expensive missteps and wasted resources for studios when the feature or show turns into a flop. &lt;/p&gt;

&lt;p&gt;As someone who has worked in the breeding ground of popular webcomics, I asked: what if there was a system that could measure “success potential” of IPs based on real user behavior? Using ML, I wanted to see if I could build a forecasting model that could rank unadapted titles by their predicted commercial success. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For my endeavor, I worked with three datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Source material metadata of roughly 1,500 titles that included engagement metrics such as views, likes, subscribers, genre, release schedule, and creator usernames&lt;/li&gt;
&lt;li&gt;Produced show metadata of 1,977 titles including ratings, watcher counts, genre, episode count, and cast&lt;/li&gt;
&lt;li&gt;Historical webcomic adaptation records of 424 cross-referenced titles that went from source material to screen, with data pulled from both sides&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before any modeling, I ran exploratory data analysis on all three and found a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engagement metrics (likes, views, subscribers) were strongly correlated with each other and overall popularity&lt;/li&gt;
&lt;li&gt;Genre and tags correlated with watcher counts in the produced show data&lt;/li&gt;
&lt;li&gt;Creator frequency showed no statistically significant impact on adaptation success, which directly contradicted what studios commonly assume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhirqnr4guj1s9aabpv0v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhirqnr4guj1s9aabpv0v.png" alt="Modeling Pipeline" width="800" height="853"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineering the Target Variable&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One hurdle I ran into was that I couldn't directly measure adaptation “success" from the source material side alone. So I engineered a composite Popularity Score by normalizing and combining views, likes, and subscribers into a single metric representing audience appeal, which became the target variable for prediction.&lt;/p&gt;

&lt;p&gt;For the produced show data, I created a parallel score using rating and watcher count.&lt;/p&gt;

&lt;p&gt;Since correlation analysis confirmed that source popularity and show popularity moved together in historical adaptations, I used source popularity as a proxy target.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4zeu7ke58soe0baxo2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4zeu7ke58soe0baxo2z.png" alt="Close overlap between actual and predicted curves" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple vs Complex Models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I implemented three models: Random Forest, XGBoost, and Ridge Regression. If you worked with ML models, there’s an expectation that the more complex models will win. However, this wasn’t the case. Ridge Regression became the unexpected underdog model that won:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dqerf5x3rq1u3tnxs0d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dqerf5x3rq1u3tnxs0d.png" alt="Cross-validation applied across all three models to reduce overfitting risk and validate stability on the adaptation dataset." width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cross-validated all three models to reduce overfitting and validate stability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Likes = Success&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using standardized coefficients for feature importance in the Ridge model, the ranking was as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Likes (strongest predictor by a significant margin)&lt;/li&gt;
&lt;li&gt;Views&lt;/li&gt;
&lt;li&gt;Subscribers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The factors that studios often focus on such as creator reputation, genre, rating, and engagement rate showed weak or no statistical significance.&lt;/p&gt;

&lt;p&gt;I validated this further using Mann-Whitney U tests comparing adapted titles against the general pool. Adapted titles showed significantly higher “likes” than non-adapted ones and the difference was meaningful.&lt;/p&gt;

&lt;p&gt;Feature Importance for Ridge regression(standardized coefficients)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pnlaapyh3eq148fjh7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pnlaapyh3eq148fjh7m.png" alt="Creator, genre, and rating showed no statistically significant impact and were excluded from the final model" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So why “likes”? &lt;/p&gt;

&lt;p&gt;One interpretation is that likes are intentional. A view can be passive while a subscription can be habitual. But giving a “like” is an act of emotional investment and this behavior is exactly what translates from IP to screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The final model produced a ranked list of the top 10 unadapted webcomic titles by predicted success, along with contextual signals for each including genre appeal, subscriber trends, engagement consistency, and creator track record.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fj4jgpy0pa8lwvh94wa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fj4jgpy0pa8lwvh94wa.png" alt="Top unadapted titles" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Qualitative review of the top 10 confirmed alignment with the engagement patterns seen in historically successful adaptations. Cliff's Delta calculations showed that the predicted top titles had significantly higher likes than past adaptations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations on the Model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Part of doing good data work is being honest about the limitations. There were a few things that fell short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small adaptation dataset. 424 entries is workable, but more data would reduce overfitting risk and better generalization.&lt;/li&gt;
&lt;li&gt;Proxy target variable. Using source popularity instead of actual show performance is a justified simplification, but it means the model can't fully capture real-world production quality, casting, or distribution reach.&lt;/li&gt;
&lt;li&gt;Categorical features dropped. Creator and genre have too many levels and their coefficients dominated the model without adding significance. Excluding them improved interpretability but at the cost of losing some nuance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I'd Do Next&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If I extended this project, I'd rethink how signal is captured and focus on the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use NLP for deeper context

&lt;ul&gt;
&lt;li&gt;Synopsis embeddings or sentiment analysis on reader reviews could capture thematic richness that raw engagement metrics miss.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Take a hybrid ranking approach

&lt;ul&gt;
&lt;li&gt;Combining regression with a learning-to-rank algorithm could improve recommendation quality at the top of the list, where small differences actually matter.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Longitudinal validation

&lt;ul&gt;
&lt;li&gt;The real test is tracking what happens when predicted titles actually get produced. Building a feedback loop into the model would sharpen it over time.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Final Thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The core insight here doesn’t only strictly apply to entertainment. It can apply to decisions that are being made by intuition or legacy practice. As the models showed, behavioral signals from real users outperform assumptions about what will succeed.&lt;/p&gt;

&lt;p&gt;Likes beat creator prestige. Engagement beat genre conventions. The audience’s preferences, not the ones from industry decision makers, predicted outcomes more reliably.&lt;/p&gt;

&lt;p&gt;Whether you're choosing which content to produce, which features to build, or which markets to enter, the same principle applies. The answers are within the data, but we often overlook the right signals. &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>python</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
