<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ninjeneer</title>
    <description>The latest articles on Forem by Ninjeneer (@ninjeneer).</description>
    <link>https://forem.com/ninjeneer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F800739%2Fdc032e33-ab21-4a8a-a228-6f09c8b3a400.png</url>
      <title>Forem: Ninjeneer</title>
      <link>https://forem.com/ninjeneer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ninjeneer"/>
    <language>en</language>
    <item>
      <title>For anyone suffering from Out Of Memory errors at build time in Vercel, due to Sentry ✅</title>
      <dc:creator>Ninjeneer</dc:creator>
      <pubDate>Mon, 01 Sep 2025 14:03:48 +0000</pubDate>
      <link>https://forem.com/ninjeneer/for-anyone-suffering-from-out-of-memory-errors-at-build-time-in-vercel-due-to-sentry-594n</link>
      <guid>https://forem.com/ninjeneer/for-anyone-suffering-from-out-of-memory-errors-at-build-time-in-vercel-due-to-sentry-594n</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh" class="crayons-story__hidden-navigation-link"&gt;Preventing Vercel "Out of Memory" build errors&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ninjeneer" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F800739%2Fdc032e33-ab21-4a8a-a228-6f09c8b3a400.png" alt="ninjeneer profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ninjeneer" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Ninjeneer
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Ninjeneer
                
              
              &lt;div id="story-author-preview-content-2812768" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ninjeneer" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F800739%2Fdc032e33-ab21-4a8a-a228-6f09c8b3a400.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Ninjeneer&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Sep 1 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh" id="article-link-2812768"&gt;
          Preventing Vercel "Out of Memory" build errors
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/vercel"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;vercel&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/sentry"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;sentry&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/nextjs"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;nextjs&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/githubactions"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;githubactions&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>vercel</category>
      <category>sentry</category>
      <category>nextjs</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Preventing Vercel "Out of Memory" build errors</title>
      <dc:creator>Ninjeneer</dc:creator>
      <pubDate>Mon, 01 Sep 2025 14:01:06 +0000</pubDate>
      <link>https://forem.com/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh</link>
      <guid>https://forem.com/ninjeneer/preventing-vercel-out-of-memory-build-errors-21mh</guid>
      <description>&lt;p&gt;Vercel is a great platform when it comes to hosting easily any web projects. Using it with their own product, NextJS, makes it even smoother. &lt;strong&gt;It allows any developer to be fully production-ready in such a short time, and for free&lt;/strong&gt; : plug your Github repository, do some quick setup and &lt;strong&gt;boom&lt;/strong&gt;, you're live 🚀&lt;/p&gt;

&lt;p&gt;However, when your project is growing and scaling, you start to hit Vercel limitations, at least, free tier limitations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;Here at &lt;a href="https://fluum.ai?utm_source=dev.to"&gt;Fluum&lt;/a&gt; we are building the next-gen AI Co-Founder for solopreneurs, allowing anyone to run their business and sell their services seamlessly. Focus on what matters to you, your AI Co-Founder handles the rest. &lt;br&gt;
We offer this whole set of features at &lt;a href="https://fluum.ai/pricing?utm_source=dev.to"&gt;a very competitive price&lt;/a&gt;. Stop juggling between many tools, &lt;a href="https://fluum.ai?utm_source=dev.to"&gt;Fluum&lt;/a&gt; is the all-in-one tool you need.&lt;/p&gt;

&lt;p&gt;Following our rapid gain of users and traction, the codebase grows, the number of modules grows as well and the tools we need to use to monitor everything become important.&lt;/p&gt;

&lt;p&gt;While we used to have build times around 4 to 5 minutes, it quickly became unpredictable, from 5 to 25 minutes, with some "Out of Memory" errors time to time. After a quick check, we found the culprit : &lt;strong&gt;Sentry&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Sentry is a tool that allows us to monitor our production frontend errors by sending us alerts whenever something bad happens. To run properly, it needs to gather sourcemaps from the NextJS build and upload them to Sentry, and that's where it starts eating up all the RAM&lt;/p&gt;
&lt;h2&gt;
  
  
  Initial setup
&lt;/h2&gt;

&lt;p&gt;Following any documentation online, Sentry is said to be added at build time, in the &lt;code&gt;next.config.js&lt;/code&gt; as follows :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;nextConfig&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;module.exports&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;withSentryConfig(nextConfig,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since this Sentry build happens inside Vercel, we are quickly limited by the resources offered and start to hit OOM errors&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Command "pnpm run build" exited with SIGKILL
▲ Build system report
▲ To always completely log this report, add VERCEL_BUILD_SYSTEM_REPORT=1 as an Environment Variable to your project.
• At least one "Out of Memory" ("OOM") event was detected during the build.
  • This occurs when processes or applications running during the build completely fill up the available memory (RAM) in the build container. When this happens, the build container terminates one of the processes during the build with a SIGKILL signal.
  • Read this troubleshooting guide for more information: https://vercel.link/troubleshoot-build-errors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Remediations
&lt;/h2&gt;

&lt;p&gt;There are 2 solutions to this problem :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Remove Sentry and become blind&lt;/li&gt;
&lt;li&gt;Remove Sentry... from NextJS build and handle it outside&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We are of course going to opt for choice 2 :) &lt;/p&gt;

&lt;p&gt;The key component here becomes GitHub Action : we are going to prebuild the NextJS project in GitHub Action, then run the Sentry CLI to create a new release with sourcemaps, and then deploy the prebuilt NextJS to Vercel using Vercel CLI&lt;/p&gt;

&lt;p&gt;Now on Vercel, we unlink the GitHub repository and select the deployment mode from "NextJS" to "Others"&lt;/p&gt;

&lt;p&gt;Here is the final Github Action file :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Vercel Production Deployment&lt;/span&gt;
&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;VERCEL_ORG_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.VERCEL_ORG_ID }}&lt;/span&gt;
    &lt;span class="na"&gt;VERCEL_PROJECT_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.VERCEL_PROJECT_ID_PROD }}&lt;/span&gt;
    &lt;span class="na"&gt;NODE_OPTIONS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--max-old-space-size=4096'&lt;/span&gt; &lt;span class="c1"&gt;# 4 GB&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;master&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;notify-slack-starting-production-deployment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Notify Slack&lt;/span&gt;
        &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
        &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v2&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get commit message&lt;/span&gt;
              &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;commit_message&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                  &lt;span class="s"&gt;# Escape quotes and newlines in commit message for JSON&lt;/span&gt;
                  &lt;span class="s"&gt;COMMIT_MSG=$(git log -1 --pretty=%B | sed 's/"/\\"/g' | tr '\n' ' ' | sed 's/[[:space:]]*$//')&lt;/span&gt;
                  &lt;span class="s"&gt;echo "::set-output name=message::$COMMIT_MSG"&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get commit hash&lt;/span&gt;
              &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;commit_hash&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                  &lt;span class="s"&gt;echo "::set-output name=hash::$(git rev-parse --short HEAD)"&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Send Slack notification&lt;/span&gt;
              &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;slackapi/slack-github-action@v1.24.0&lt;/span&gt;
              &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                      &lt;span class="s"&gt;{&lt;/span&gt;
                        &lt;span class="s"&gt;"text": "Deploying Production Frontend | ${{ steps.commit_hash.outputs.hash }} : ${{ steps.commit_message.outputs.message }}"&lt;/span&gt;
                      &lt;span class="s"&gt;}&lt;/span&gt;
              &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;SLACK_WEBHOOK_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SLACK_PROD_WEBHOOK_URL }}&lt;/span&gt;

    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
        &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v2&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install pnpm&lt;/span&gt;
              &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm/action-setup@v2&lt;/span&gt;
              &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;latest&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use Node.js&lt;/span&gt;
              &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v3&lt;/span&gt;
              &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;22'&lt;/span&gt;
                  &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pnpm'&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Cache pnpm store &amp;amp; node_modules&lt;/span&gt;
              &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v3&lt;/span&gt;
              &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm-cache&lt;/span&gt;
              &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="c1"&gt;# Cache both the global store and node_modules&lt;/span&gt;
                  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                      &lt;span class="s"&gt;~/.pnpm-store&lt;/span&gt;
                      &lt;span class="s"&gt;node_modules&lt;/span&gt;
                  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}&lt;/span&gt;
                  &lt;span class="na"&gt;restore-keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                      &lt;span class="s"&gt;pnpm-${{ runner.os }}-&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Dependencies&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --frozen-lockfile&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Vercel CLI&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --global vercel@latest&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pull Vercel Environment Information&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vercel pull --yes --environment=production --token=${{ secrets.VERCEL_TOKEN }}&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Project Artifacts&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vercel build --debug --prod --token=${{ secrets.VERCEL_TOKEN }}&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Sentry CLI&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --global @sentry/cli@latest&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload Sentry Source Maps&lt;/span&gt;
              &lt;span class="c1"&gt;# only on push to master&lt;/span&gt;
              &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event_name == 'push'&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                  &lt;span class="s"&gt;npx @sentry/cli releases new $GITHUB_SHA&lt;/span&gt;
                  &lt;span class="s"&gt;npx @sentry/cli releases files $GITHUB_SHA upload-sourcemaps .next --rewrite&lt;/span&gt;
                  &lt;span class="s"&gt;npx @sentry/cli releases finalize $GITHUB_SHA&lt;/span&gt;
              &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;SENTRY_AUTH_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SENTRY_AUTH_TOKEN }}&lt;/span&gt;
                  &lt;span class="na"&gt;SENTRY_ORG&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SENTRY_ORG }}&lt;/span&gt;
                  &lt;span class="na"&gt;SENTRY_PROJECT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SENTRY_PROJECT }}&lt;/span&gt;

            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy Project Artifacts to Vercel&lt;/span&gt;
              &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vercel deploy --prebuilt --prod --token=${{ secrets.VERCEL_TOKEN }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using this flow, our production builds are now back to approx. 5 minutes, and free of any Out of Memory error 🎉&lt;/p&gt;

&lt;p&gt;Additional win : you can save on your organization costs by only having 1 seat in Vercel, now that this is the GitHub Action that is responsible for the deployment 💸&lt;/p&gt;

&lt;p&gt;Hoping this solution will help many folks out there and save some time - keep growing 🚀&lt;/p&gt;

</description>
      <category>vercel</category>
      <category>sentry</category>
      <category>nextjs</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>How I built an automated vulnerability scanner SECaaS</title>
      <dc:creator>Ninjeneer</dc:creator>
      <pubDate>Tue, 06 Jun 2023 20:46:09 +0000</pubDate>
      <link>https://forem.com/ninjeneer/how-i-built-an-automated-vulnerability-scanner-secaas-3gp0</link>
      <guid>https://forem.com/ninjeneer/how-i-built-an-automated-vulnerability-scanner-secaas-3gp0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F789vlrsuo6zypyd5p7ec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F789vlrsuo6zypyd5p7ec.png" alt="Cyber Eye Logo" width="200" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Few days ago, I released the &lt;a href="https://cyber-eye.fr" rel="noopener noreferrer"&gt;Cyber Eye&lt;/a&gt; web platform. A SECaaS that allows any user, even those without technical knowledge, to run some basic &amp;amp; advanced security scans on their infrastructure.&lt;/p&gt;

&lt;p&gt;Each of those scans take place from the outside, simulating what an attacker could be able to search for. &lt;/p&gt;

&lt;p&gt;Today, we'll go down the technical side of this platform, and I'll explain how I did it. I am open to any suggestions to improve it :) &lt;/p&gt;




&lt;h2&gt;
  
  
  Server architecture
&lt;/h2&gt;

&lt;p&gt;For this project, I decided to take as a priority the scalability. In fact, as users can themselves decide the periodicity of their scans (via a cron user-friendly selector), I expect some heavy loads during nights at 12AM.&lt;/p&gt;

&lt;p&gt;So I decided to split the architecture this way :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A service managing the user requests&lt;/strong&gt; responsible of validation, user credits checking, publishing the request to other services)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An AWS SQS queue&lt;/strong&gt; in which the &lt;strong&gt;requests are sent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A service&lt;/strong&gt; listening to the above queue, &lt;strong&gt;responsible of deploying the security probes&lt;/strong&gt; (each probe being containerized with Docker)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An AWS SQS queue&lt;/strong&gt; in which probes are &lt;strong&gt;sending notifications of results&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;MongoDB atlas database&lt;/strong&gt; storing the raw results of the probes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A service managing the probe results&lt;/strong&gt;, to parse them and create a final report provided to the user&lt;/li&gt;
&lt;li&gt;Finally, &lt;strong&gt;a real time database&lt;/strong&gt; provided by &lt;a href="https://supabase.co" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; to handle basic data storage and allow UI responsiveness&lt;/li&gt;
&lt;li&gt;3 other additional services handling automated jobs, platform stats and billing using &lt;a href="https://stripe.com" rel="noopener noreferrer"&gt;Stripe&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3an8uml6hwutcnw8tx8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3an8uml6hwutcnw8tx8.jpg" alt="Services Architecture" width="731" height="679"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Request Service
&lt;/h3&gt;

&lt;p&gt;This service is the entry point of the platform. It is responsible of handling user requests, making sure they have enough credits to run their scan and publishing the request to the AWS SQS Queue.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Report Service
&lt;/h3&gt;

&lt;p&gt;This service is the final piece of the workflow, triggered when the last probe of the scan is ended. It will pull all the probes results from Mongo, and then parse each of them based on a defined parser. Once every result is properly parsed, they are aggregated to form a final &lt;strong&gt;report&lt;/strong&gt; embedding some more metadata. This report will be served to the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Deployer Service
&lt;/h3&gt;

&lt;p&gt;This one is the one that gave me headaches. Initially, I wanted to use an AWS Lambda function triggered by the AWS SQS Queue to deploy on demand some containers. However, I struggled a lot using ECS because it encourage the use of predefined configuration files, while I want my containers to be deployed on the fly without any structure. &lt;br&gt;
At this point, my solutions were to dig deeper in AWS services hell or to go by myself and create my own very simple ECS. And that's what I did.&lt;/p&gt;

&lt;p&gt;This homemade AWS ECS is written in Python, using the &lt;code&gt;boto3&lt;/code&gt; module to listen the AWS Queue. On each event received, the request is parsed, the relevant container is identified and a configuration is set. Then, the container is placed inside a deployment waiting queue (set randomly to 5 parallel run max.). When a deployment slot is free, the container is executed on the host machine, and killed at the end.&lt;/p&gt;

&lt;p&gt;This choice also allows me to reduce a lot the costs that would have been generated by AWS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Probes
&lt;/h3&gt;

&lt;p&gt;Each probe is wrapped inside a Docker container to be able to run it almost anywhere, decoupled from the rest of the codebase, and written in any language&lt;br&gt;
I basically do not create my own security tools, but build wrappers around famous tools like &lt;code&gt;nmap&lt;/code&gt;, &lt;code&gt;nikto&lt;/code&gt;, &lt;code&gt;sqlmap&lt;/code&gt;... to provide trustful results.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frontend architecture
&lt;/h2&gt;

&lt;p&gt;Brace yourself, this gonna be a long read&lt;/p&gt;

&lt;p&gt;I am using React and Tailwindcss :)&lt;/p&gt;




&lt;h2&gt;
  
  
  Git methology &amp;amp; CI / CD
&lt;/h2&gt;

&lt;p&gt;To be able to manage easily my models and common code, I decided to go Monorepo for all the whole server side. &lt;br&gt;
I set up some Github Actions to run my tests on each push/pull requests on my dev and production branches. Then, my docker images are built &amp;amp; stored using DockerHub.&lt;br&gt;
On the CD side, I am working on setting up a &lt;code&gt;watchtower&lt;/code&gt; on the production server to automatically pull new images and restart containers.&lt;/p&gt;

&lt;p&gt;In conclusion, this project was for me an exciting playground where I could use all my knowledge and learnt a huge amount of new things (software architecture, DevOps practices, the rigor of production...). I will surely continue this project to bring it to a mature product, with more and more probes, and will post more updates of it on this blog&lt;/p&gt;

&lt;p&gt;Thank you reading ! Happy to hear about your thoughts in the comment section &lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>saas</category>
      <category>node</category>
      <category>docker</category>
    </item>
    <item>
      <title>Creating a Netflix clone</title>
      <dc:creator>Ninjeneer</dc:creator>
      <pubDate>Fri, 21 Jan 2022 16:27:08 +0000</pubDate>
      <link>https://forem.com/ninjeneer/creating-a-netflix-clone-4fb9</link>
      <guid>https://forem.com/ninjeneer/creating-a-netflix-clone-4fb9</guid>
      <description>&lt;h2&gt;
  
  
  Disclaimer
&lt;/h2&gt;

&lt;p&gt;This project is for &lt;strong&gt;exercice&lt;/strong&gt; purpose &lt;strong&gt;only&lt;/strong&gt;. The whole source code will not be shared to avoid abuses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;For the sake of the experimentation, I'm willing to build a Netflix clone, based on automated torrent downloading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;p&gt;To build a such system, I will need a web platform able to stream videos accross multiple devices as Netflix does. Hopefully, the &lt;a href="https://www.plex.tv" rel="noopener noreferrer"&gt;Plex platform&lt;/a&gt; already does the job in a pretty awesome way. Therefore, I will just have to build a software able to search crawl in torrent websites in order to find and download the films and series I want to watch.&lt;/p&gt;

&lt;p&gt;The idea is to have a web interface, asking me for the film I want to watch. If I already have it on my hard drive, it will open the Plex platform. If not, it will trigger an automated torrent download and move the film/serie into my plex media folder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical stack
&lt;/h2&gt;

&lt;p&gt;For this projet, I will use the Python language. As I haven't really worked with it yet, it will be a great introduction to this technology.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 : Getting the web page
&lt;/h3&gt;

&lt;p&gt;After a research on the website, a specific URL is built. For instance, when searching for the avengers film, the URL looks like this : &lt;a href="https://xxxxxxx/search/avengers/1/99/200" rel="noopener noreferrer"&gt;https://xxxxxxx/search/avengers/1/99/200&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To process web scrapping, I am using the &lt;code&gt;BeautifulSoup4&lt;/code&gt; python module coupled with the &lt;code&gt;requests&lt;/code&gt; one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;XXXParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Parser&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xxx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__build_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;film_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/search/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;film_name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/1/99/200&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__get_page_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;film_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;html_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__build_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;film_name&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The website I am scrapping is structured this way :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;tr&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;td&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"vertTh"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;center&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://.../browse/200"&lt;/span&gt; &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;"More from this category"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Video&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&amp;lt;br&amp;gt;&lt;/span&gt;
            (&lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://...y/browse/207"&lt;/span&gt; &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;"More from this category"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;HD - Movies&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;)
        &lt;span class="nt"&gt;&amp;lt;/center&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;td&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"detName"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://.../torrent/34281763/Avengers.Endgame.2019.1080p.BRRip.x264-MP4"&lt;/span&gt;
                &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"detLink"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Avengers.Endgame.2019.1080p.BRRip.x264-MP4&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"magnet:?..."&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://.../static/img/icon-magnet.gif"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"Magnet link"&lt;/span&gt; &lt;span class="na"&gt;width=&lt;/span&gt;&lt;span class="s"&gt;"12"&lt;/span&gt;&lt;span class="na"&gt;height=&lt;/span&gt;&lt;span class="s"&gt;"12"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;

        &lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://.../user/..."&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://.../static/img/trusted.png"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"Trusted"&lt;/span&gt; &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;"Trusted"&lt;/span&gt; &lt;span class="na"&gt;style=&lt;/span&gt;&lt;span class="s"&gt;"width:11px;"&lt;/span&gt; &lt;span class="na"&gt;width=&lt;/span&gt;&lt;span class="s"&gt;"11"&lt;/span&gt; &lt;span class="na"&gt;height=&lt;/span&gt;&lt;span class="s"&gt;"11"&lt;/span&gt; &lt;span class="na"&gt;border=&lt;/span&gt;&lt;span class="s"&gt;"0"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;td&lt;/span&gt; &lt;span class="na"&gt;align=&lt;/span&gt;&lt;span class="s"&gt;"right"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;1803&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;td&lt;/span&gt; &lt;span class="na"&gt;align=&lt;/span&gt;&lt;span class="s"&gt;"right"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;383&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/tr&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After receiving the web page as a BeautifulSoup result object, I can start filtering the HTML tags to retrieve the information I am looking for :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Title&lt;/li&gt;
&lt;li&gt;Download URL&lt;/li&gt;
&lt;li&gt;Number of seeders&lt;/li&gt;
&lt;li&gt;Trusted uploader&lt;/li&gt;
&lt;li&gt;Video quality
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__get_page_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;film_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.vertTh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# This is not a table row containing a film
&lt;/span&gt;      &lt;span class="k"&gt;continue&lt;/span&gt;

   &lt;span class="n"&gt;film_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.detLink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
   &lt;span class="n"&gt;film_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.detLink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;seeders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;leechers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;trusted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;img[alt=Trusted]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d{4}p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;film_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2 : filtering results
&lt;/h3&gt;

&lt;p&gt;One of the problem with torrents name, is their unintelligable names. Many of them basically looks like this &lt;code&gt;Avengers.Infinity.War.2018.1080p.10bit.BluRay.8CH.x265.HEVC-PSA&lt;/code&gt; which makes the work harder when it goes to filtering data.&lt;/p&gt;

&lt;p&gt;So, I need to identify which text to remove to clear the titles.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replace dots by spaces&lt;/li&gt;
&lt;li&gt;Remove the quality using a regex (&lt;code&gt;\d{3,4}p&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Remove the tags "DVDrip", "HDrip" etc... using a regex (&lt;code&gt;\w{2,3}rip&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Remove repeted keywords among all titles : blueray, bluray, HEVC, AAC, ACC, PSA, MP4....&lt;/li&gt;
&lt;li&gt;Remove encoding tags with regex (&lt;code&gt;(x|h)\d+&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Remove useless "The" at the beggining of titles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I now have more natural results :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;avengers
avengers endgame
avengers endgame (2019)
avengers infinity war
avengers infinity war 2018 english
avengers age of ultron (2015)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3 : Sorting results
&lt;/h3&gt;

&lt;p&gt;I don't want to spend time filtering the results myself to find the best one, I want it to be automated. That's why I need to give a score to each result based on several criteria.&lt;br&gt;
By default, every result has a score of 0.&lt;/p&gt;

&lt;h4&gt;
  
  
  Levenshtein distance
&lt;/h4&gt;

&lt;p&gt;This one is the most important of all scoring methods.&lt;/p&gt;

&lt;p&gt;The levenshtein distance calculates the number of changes needed to go from a string A to a string B. In my case, I want the levenshtein distance to be the lower as possible between my query and the titles. Thanks to the previous title clearing done above, film titles already looks pretty natural.&lt;/p&gt;

&lt;h4&gt;
  
  
  Seeders
&lt;/h4&gt;

&lt;p&gt;As I want my film to be downloaded as fast as possible, I'm looking for the ones with the most seeders. To avoid increasing too much the score based on the number of seeders, I am using the mathematical square root function, where the Y values increases slower as the X values increases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mdf9vabm7at5vrb4vo2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mdf9vabm7at5vrb4vo2.png" alt="Squart root function graph" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Language
&lt;/h4&gt;

&lt;p&gt;As a french speaker, I prefer watching french movies. If the movie title contains "french" keyword, then its score is increased by one. However, if it only contains a "fr" keyword, its score is increased by 0.5 because I am less sure it is a french language related tag.&lt;/p&gt;

&lt;h4&gt;
  
  
  Quality
&lt;/h4&gt;

&lt;p&gt;The quality is also an important criteria. If the title contains a quality greater or equal than 1080p, the film's score increases of 1 points. If the quality is lower, it increases proportionnally to the quality (720p =&amp;gt; 0.5, 480p =&amp;gt; 0.25...)&lt;/p&gt;

&lt;h4&gt;
  
  
  Trusted uploader
&lt;/h4&gt;

&lt;p&gt;The website I am scrapping has the ability to reward users with a tag "Trusted". This tag insures me a good quality and an accurate content. A film uploaded by a trusted uploader automatically increases its score by 1. &lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 : automate download
&lt;/h3&gt;

&lt;p&gt;To be continued...&lt;/p&gt;




&lt;p&gt;Thanks for reading, keep in mind to stay awesome !&lt;/p&gt;

</description>
      <category>python</category>
      <category>algorithms</category>
      <category>automatisation</category>
      <category>torrent</category>
    </item>
  </channel>
</rss>
